Sponsored Content
Top Forums Shell Programming and Scripting Parse the longest matching string Post 303020966 by senhia83 on Wednesday 1st of August 2018 10:02:48 AM
Old 08-01-2018
Parse the longest matching string

Hello experts,

I am trying to unscramble a mixed signal into component signals.

Let the list of known signals be

Code:
$ cat tmplist


DU
DU4016
GFF
GFF2010
GFF201019
G2115
G211
DU40


Let the scrambled signals (separated by "/") be

Code:
$ cat tmpsignal
([GFF201019B-//C21-DU4016/*DU/GFF2010
DU40/GFF201019-b-1-2-3-/DU4016/GFF2010/THFFF

My desired output is


Code:
([GFF201019B-//C21-DU4016/*DU/GFF2010	GFF201019	DU4016	DU	GFF2010
DU40/GFF201019-b-1-2-3-/DU4016/GFF2010/THFFF	DU40	GFF201019	GFF2010	THFFF

When I iterate over an array of known signals it gives me the shortest matching signal (which can be sub-string of a bigger signal)


Code:
$ awk -F"/" 'NR==FNR{a[$1];next}{t=$0; for(i=1;i<=NF;i++) { for (as in a) { if ($i~as) {$i=as}}} print t,$0}' tmplist tmpsignal
([GFF201019B-//C21-DU4016/*DU/GFF2010 GFF  DU DU GFF
DU40/GFF201019-b-1-2-3-/DU4016/GFF2010/THFFF DU GFF DU GFF THFFF

Please assist, how can I catch the longest possible match? The original data has ~20k known signals and ~20 million scrambled ones.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed problem - replacement string should be same length as matching string.

Hi guys, I hope you can help me with my problem. I have a text file that contains lines like this: 78 ANGELO -809.05 79 ANGELO2 -5,000.06 I need to find all occurences of amounts that are negative and replace them with x's 78 ANGELO xxxxxxx 79... (4 Replies)
Discussion started by: amangeles
4 Replies

2. Shell Programming and Scripting

Parse string

Hi, I need to parse a string, check if there are periods and strip the string. For example i have the following domains and subdomains: mydomain.com, dev.mydomain.com I need to strip all periods so i have a string without periods or domain extensions: mydomain, devmydomain. I use this for... (12 Replies)
Discussion started by: ktm
12 Replies

3. Shell Programming and Scripting

Longest prefix matching -answer found

Hi Everyone, #!/usr/bin/perl use strict; use warnings; my %prefix_to_rate = ( '93' => "1.50", '6iii' => "0.22" ); my ( $shortest, $longest ) = ( sort { $a <=> $b } map { length } keys %prefix_to_rate ); for my $len ( reverse $shortest .. $longest ) { print ... (0 Replies)
Discussion started by: jimmy_y
0 Replies

4. Shell Programming and Scripting

Find longest string and print it

Hello all, I need to find the longest string in a select field and print that field. I have tried a few different methods and I always end up one step from where I need to be. Methods thus far: nawk '{if (length($1) > long) long=length($1); if(length($1)==long) print $1}' The above... (6 Replies)
Discussion started by: SEinT
6 Replies

5. Emergency UNIX and Linux Support

[Solved] AWK to parse adjacent matching lines

Hi, I have an input file like F : 0.1 : 0.002 P : 0.3 : 0.004 P : 0.5 : 0.008 P : 0.1 : 0.005 L : 0.05 : 0.02 P: 0.1 : 0.006 P : 0.01 : 0.08 F : 0.02 : 0.08 Expected output: (2 Replies)
Discussion started by: vasanth.vadalur
2 Replies

6. Shell Programming and Scripting

Matching string from input to string of file

Hi, i want to know how to compare string of file with input string im trying following code: file_no=`paste -s -d "||||\n" a.txt | cut -c 1` #it will return collection number from file echo "enter number" read " curr_no" if ; then echo " current number already present" fi ... (4 Replies)
Discussion started by: a_smith
4 Replies

7. Shell Programming and Scripting

Longest length of string in array

I would be grateful if someone could help me. I am trying to write a .sh script in UNIX. I have the following code; User=john User=james User=ian User=martin for x in ${User} do print ${#x} done This produces the following output; 4 5 3 6 (12 Replies)
Discussion started by: mmab
12 Replies

8. Shell Programming and Scripting

parse a mixed alphanumeric string from within a string

Hi, I would like to be able to parse out a substring matching a basic pattern, which is a character followed by 3 or 4 digits (for example S1234 out of a larger string). The main string would just be a filename, like Thisis__the FileName_S1234_ToParse.txt. The filename isn't fixed, but the... (2 Replies)
Discussion started by: keaneMB
2 Replies

9. Shell Programming and Scripting

awk uniq and longest string of a column as index

I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions: 1) longest string of each pattern in column 2, ignore any sub-string, as the index; 2) all the unique patterns after 1); 3) print the whole row; input: 1 ABCDEFGHI longest_sequence1 2 ABCDEFGH... (12 Replies)
Discussion started by: yifangt
12 Replies

10. UNIX for Beginners Questions & Answers

Replace substring by longest string in common field (awk)

Hi, Let's say I have a pipe-separated input like so: name_10|A|BCCC|cat_1 name_11|B|DE|cat_2 name_10|A|BC|cat_3 name_11|B|DEEEEEE|cat_4 Using awk, for records with common field 2, I am trying to replace all the shortest substrings by the longest string in field 3. In order to get the... (5 Replies)
Discussion started by: beca123456
5 Replies
Locale::Codes::LangExt(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangExt(3pm)

NAME
Locale::Codes::LangExt - standard codes for language extension identification SYNOPSIS
use Locale::Codes::LangExt; $lext = code2langext('acm'); # $lext gets 'Mesopotamian Arabic' $code = langext2code('Mesopotamian Arabic'); # $code gets 'acm' @codes = all_langext_codes(); @names = all_langext_names(); DESCRIPTION
The "Locale::Codes::LangExt" module provides access to standard codes used for identifying language extensions, such as those as defined in the IANA language registry. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language registry codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language extensions. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lext = code2langext('acm','alpha'); $lext = code2langext('acm',LOCALE_LANGEXT_ALPHA); The codesets currently supported are: alpha This is the set of three-letter (lowercase) codes from the IANA language registry, such as 'acm' for Mesopotamian Arabic. This is the default code set. ROUTINES
code2langext ( CODE [,CODESET] ) langext2code ( NAME [,CODESET] ) langext_code2code ( CODE ,CODESET ,CODESET2 ) all_langext_codes ( [CODESET] ) all_langext_names ( [CODESET] ) Locale::Codes::LangExt::rename_langext ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangExt::add_langext ( CODE ,NAME [,CODESET] ) Locale::Codes::LangExt::delete_langext ( CODE [,CODESET] ) Locale::Codes::LangExt::add_langext_alias ( NAME ,NEW_NAME ) Locale::Codes::LangExt::delete_langext_alias ( NAME ) Locale::Codes::LangExt::rename_langext_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangExt::add_langext_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangExt::delete_langext_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.iana.org/assignments/language-subtag-registry The IANA language subtag registry. AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2012 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.16.2 2012-10-11 Locale::Codes::LangExt(3pm)
All times are GMT -4. The time now is 07:03 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy