Sponsored Content
Top Forums Shell Programming and Scripting Parse the longest matching string Post 303020966 by senhia83 on Wednesday 1st of August 2018 10:02:48 AM
Old 08-01-2018
Parse the longest matching string

Hello experts,

I am trying to unscramble a mixed signal into component signals.

Let the list of known signals be

Code:
$ cat tmplist


DU
DU4016
GFF
GFF2010
GFF201019
G2115
G211
DU40


Let the scrambled signals (separated by "/") be

Code:
$ cat tmpsignal
([GFF201019B-//C21-DU4016/*DU/GFF2010
DU40/GFF201019-b-1-2-3-/DU4016/GFF2010/THFFF

My desired output is


Code:
([GFF201019B-//C21-DU4016/*DU/GFF2010	GFF201019	DU4016	DU	GFF2010
DU40/GFF201019-b-1-2-3-/DU4016/GFF2010/THFFF	DU40	GFF201019	GFF2010	THFFF

When I iterate over an array of known signals it gives me the shortest matching signal (which can be sub-string of a bigger signal)


Code:
$ awk -F"/" 'NR==FNR{a[$1];next}{t=$0; for(i=1;i<=NF;i++) { for (as in a) { if ($i~as) {$i=as}}} print t,$0}' tmplist tmpsignal
([GFF201019B-//C21-DU4016/*DU/GFF2010 GFF  DU DU GFF
DU40/GFF201019-b-1-2-3-/DU4016/GFF2010/THFFF DU GFF DU GFF THFFF

Please assist, how can I catch the longest possible match? The original data has ~20k known signals and ~20 million scrambled ones.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed problem - replacement string should be same length as matching string.

Hi guys, I hope you can help me with my problem. I have a text file that contains lines like this: 78 ANGELO -809.05 79 ANGELO2 -5,000.06 I need to find all occurences of amounts that are negative and replace them with x's 78 ANGELO xxxxxxx 79... (4 Replies)
Discussion started by: amangeles
4 Replies

2. Shell Programming and Scripting

Parse string

Hi, I need to parse a string, check if there are periods and strip the string. For example i have the following domains and subdomains: mydomain.com, dev.mydomain.com I need to strip all periods so i have a string without periods or domain extensions: mydomain, devmydomain. I use this for... (12 Replies)
Discussion started by: ktm
12 Replies

3. Shell Programming and Scripting

Longest prefix matching -answer found

Hi Everyone, #!/usr/bin/perl use strict; use warnings; my %prefix_to_rate = ( '93' => "1.50", '6iii' => "0.22" ); my ( $shortest, $longest ) = ( sort { $a <=> $b } map { length } keys %prefix_to_rate ); for my $len ( reverse $shortest .. $longest ) { print ... (0 Replies)
Discussion started by: jimmy_y
0 Replies

4. Shell Programming and Scripting

Find longest string and print it

Hello all, I need to find the longest string in a select field and print that field. I have tried a few different methods and I always end up one step from where I need to be. Methods thus far: nawk '{if (length($1) > long) long=length($1); if(length($1)==long) print $1}' The above... (6 Replies)
Discussion started by: SEinT
6 Replies

5. Emergency UNIX and Linux Support

[Solved] AWK to parse adjacent matching lines

Hi, I have an input file like F : 0.1 : 0.002 P : 0.3 : 0.004 P : 0.5 : 0.008 P : 0.1 : 0.005 L : 0.05 : 0.02 P: 0.1 : 0.006 P : 0.01 : 0.08 F : 0.02 : 0.08 Expected output: (2 Replies)
Discussion started by: vasanth.vadalur
2 Replies

6. Shell Programming and Scripting

Matching string from input to string of file

Hi, i want to know how to compare string of file with input string im trying following code: file_no=`paste -s -d "||||\n" a.txt | cut -c 1` #it will return collection number from file echo "enter number" read " curr_no" if ; then echo " current number already present" fi ... (4 Replies)
Discussion started by: a_smith
4 Replies

7. Shell Programming and Scripting

Longest length of string in array

I would be grateful if someone could help me. I am trying to write a .sh script in UNIX. I have the following code; User=john User=james User=ian User=martin for x in ${User} do print ${#x} done This produces the following output; 4 5 3 6 (12 Replies)
Discussion started by: mmab
12 Replies

8. Shell Programming and Scripting

parse a mixed alphanumeric string from within a string

Hi, I would like to be able to parse out a substring matching a basic pattern, which is a character followed by 3 or 4 digits (for example S1234 out of a larger string). The main string would just be a filename, like Thisis__the FileName_S1234_ToParse.txt. The filename isn't fixed, but the... (2 Replies)
Discussion started by: keaneMB
2 Replies

9. Shell Programming and Scripting

awk uniq and longest string of a column as index

I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions: 1) longest string of each pattern in column 2, ignore any sub-string, as the index; 2) all the unique patterns after 1); 3) print the whole row; input: 1 ABCDEFGHI longest_sequence1 2 ABCDEFGH... (12 Replies)
Discussion started by: yifangt
12 Replies

10. UNIX for Beginners Questions & Answers

Replace substring by longest string in common field (awk)

Hi, Let's say I have a pipe-separated input like so: name_10|A|BCCC|cat_1 name_11|B|DE|cat_2 name_10|A|BC|cat_3 name_11|B|DEEEEEE|cat_4 Using awk, for records with common field 2, I am trying to replace all the shortest substrings by the longest string in field 3. In order to get the... (5 Replies)
Discussion started by: beca123456
5 Replies
SIGPENDING(2)						     Linux Programmer's Manual						     SIGPENDING(2)

NAME
sigpending - examine pending signals SYNOPSIS
#include <signal.h> int sigpending(sigset_t *set); Feature Test Macro Requirements for glibc (see feature_test_macros(7)): sigpending(): _POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _POSIX_SOURCE DESCRIPTION
sigpending() returns the set of signals that are pending for delivery to the calling thread (i.e., the signals which have been raised while blocked). The mask of pending signals is returned in set. RETURN VALUE
sigpending() returns 0 on success and -1 on error. In the event of an error, errno is set to indicate the cause. ERRORS
EFAULT set points to memory which is not a valid part of the process address space. CONFORMING TO
POSIX.1-2001. NOTES
See sigsetops(3) for details on manipulating signal sets. The set of signals that is pending for a thread is the union of the set of signals that is pending for that thread and the set of signals that is pending for the process as a whole; see signal(7). A child created via fork(2) initially has an empty pending signal set; the pending signal set is preserved across an execve(2). BUGS
In versions of glibc up to and including 2.2.1, there is a bug in the wrapper function for sigpending() which means that information about pending real-time signals is not correctly returned. SEE ALSO
kill(2), sigaction(2), signal(2), sigprocmask(2), sigsuspend(2), sigsetops(3), signal(7) COLOPHON
This page is part of release 3.53 of the Linux man-pages project. A description of the project, and information about reporting bugs, can be found at http://www.kernel.org/doc/man-pages/. Linux 2013-04-19 SIGPENDING(2)
All times are GMT -4. The time now is 06:29 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy