I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions:
1) longest string of each pattern in column 2, ignore any sub-string, as the index;
2) all the unique patterns after 1);
3) print the whole row;
input:
ouput:
I first pickup only the unique patterns of column2
and the file became less than ~5 millions. Not sure this is do-able with awk, and need some expertise for the second step to pickup the longest of each pattern.
Thanks a lot in advance!
Hi All,
I am new to this shell scripting world. Struck up with a problem, can anyone of you please pull me out of this.
Requirement : Need to get the index of a substring from a parent string
Eg : index("Sandy","dy") should return 4 or 3.
My Approach :
I used Awk function index to... (2 Replies)
Hii i have a file with data as shown below. Here i need to remove duplicates of the rows in such a way that
it just checks for 2,3,4,5 column for duplicates.When deleting duplicates,retain largest row i.e with many columns with values should be selected.Then it must remove duplicates such that by... (11 Replies)
Hello all,
I need to find the longest string in a select field and print that field.
I have tried a few different methods and I always end up one step from where I need to be.
Methods thus far:
nawk '{if (length($1) > long) long=length($1); if(length($1)==long) print $1}'
The above... (6 Replies)
I would be grateful if someone could help me. I am trying to write a .sh script in UNIX.
I have the following code;
User=john
User=james
User=ian
User=martin
for x in ${User}
do
print ${#x}
done
This produces the following output;
4
5
3
6 (12 Replies)
Hi,
I am trying to figure out how to get the length of the longest column in the entire file (because the length varies from one row to the other)
I was doing this at first to check how many fields I have for the first row:
awk '{print NF; exit}' file
Now, I can do this:
awk '{ if... (4 Replies)
I am trying to search a given text in a file and find its last occurrence index. The task is to append the searched index in the same file but in a separate column. I am able to accomplish the task partially and looking for a solution.
Following is the detailed description:
names_file.txt
... (17 Replies)
I want to bring values in the second column into single line for uniq value in the first column.
My input
jvm01, Web 2.0 Feature Pack Library
jvm01, IBM WebSphere JAX-RS
jvm01, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library
jvm02, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature... (10 Replies)
Hi All,
I have a file like this(having 2 column).
Column 1: like a,b,c....
Column 2: having numbers.
I want to segregate those numbers based on column 1.
Example:
file.
a 5
b 9
b 620
a 710
b 230
a 330
b 1910 (4 Replies)
Hello experts,
I am trying to unscramble a mixed signal into component signals.
Let the list of known signals be
$ cat tmplist
DU
DU4016
GFF
GFF2010
GFF201019
G2115
G211
DU40 (1 Reply)
Hi,
Let's say I have a pipe-separated input like so:
name_10|A|BCCC|cat_1
name_11|B|DE|cat_2
name_10|A|BC|cat_3
name_11|B|DEEEEEE|cat_4
Using awk, for records with common field 2, I am trying to replace all the shortest substrings by the longest string in field 3.
In order to get the... (5 Replies)
Discussion started by: beca123456
5 Replies
LEARN ABOUT DEBIAN
locale::script
Locale::Script(3perl) Perl Programmers Reference Guide Locale::Script(3perl)NAME
Locale::Script - standard codes for script identification
SYNOPSIS
use Locale::Script;
$script = code2script('phnx'); # 'Phoenician'
$code = script2code('Phoenician'); # 'Phnx'
$code = script2code('Phoenician',
LOCALE_CODE_NUMERIC); # 115
@codes = all_script_codes();
@scripts = all_script_names();
DESCRIPTION
The "Locale::Script" module provides access to standards codes used for identifying scripts, such as those defined in ISO 15924.
Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 15924
four-letter codes will be used.
SUPPORTED CODE SETS
There are several different code sets you can use for identifying scripts. The ones currently supported are:
alpha
This is a set of four-letter (capitalized) codes from ISO 15924 such as 'Phnx' for Phoenician.
This code set is identified with the symbol "LOCALE_SCRIPT_ALPHA".
The Zxxx, Zyyy, and Zzzz codes are not used.
This is the default code set.
numeric
This is a set of three-digit numeric codes from ISO 15924 such as 115 for Phoenician.
This code set is identified with the symbol "LOCALE_SCRIPT_NUMERIC".
ROUTINES
code2script ( CODE [,CODESET] )
script2code ( NAME [,CODESET] )
script_code2code ( CODE ,CODESET ,CODESET2 )
all_script_codes ( [CODESET] )
all_script_names ( [CODESET] )
Locale::Script::rename_script ( CODE ,NEW_NAME [,CODESET] )
Locale::Script::add_script ( CODE ,NAME [,CODESET] )
Locale::Script::delete_script ( CODE [,CODESET] )
Locale::Script::add_script_alias ( NAME ,NEW_NAME )
Locale::Script::delete_script_alias ( NAME )
Locale::Script::rename_script_code ( CODE ,NEW_CODE [,CODESET] )
Locale::Script::add_script_code_alias ( CODE ,NEW_CODE [,CODESET] )
Locale::Script::delete_script_code_alias ( CODE [,CODESET] )
These routines are all documented in the Locale::Codes man page.
SEE ALSO
Locale::Codes
Locale::Constants
http://www.unicode.org/iso15924/
Home page for ISO 15924.
AUTHOR
See Locale::Codes for full author history.
Currently maintained by Sullivan Beck (sbeck@cpan.org).
COPYRIGHT
Copyright (c) 1997-2001 Canon Research Centre Europe (CRE).
Copyright (c) 2001-2010 Neil Bowers
Copyright (c) 2010-2011 Sullivan Beck
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.14.2 2011-09-26 Locale::Script(3perl)