I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions:
1) longest string of each pattern in column 2, ignore any sub-string, as the index;
2) all the unique patterns after 1);
3) print the whole row;
input:
ouput:
I first pickup only the unique patterns of column2
and the file became less than ~5 millions. Not sure this is do-able with awk, and need some expertise for the second step to pickup the longest of each pattern.
Thanks a lot in advance!
Hi All,
I am new to this shell scripting world. Struck up with a problem, can anyone of you please pull me out of this.
Requirement : Need to get the index of a substring from a parent string
Eg : index("Sandy","dy") should return 4 or 3.
My Approach :
I used Awk function index to... (2 Replies)
Hii i have a file with data as shown below. Here i need to remove duplicates of the rows in such a way that
it just checks for 2,3,4,5 column for duplicates.When deleting duplicates,retain largest row i.e with many columns with values should be selected.Then it must remove duplicates such that by... (11 Replies)
Hello all,
I need to find the longest string in a select field and print that field.
I have tried a few different methods and I always end up one step from where I need to be.
Methods thus far:
nawk '{if (length($1) > long) long=length($1); if(length($1)==long) print $1}'
The above... (6 Replies)
I would be grateful if someone could help me. I am trying to write a .sh script in UNIX.
I have the following code;
User=john
User=james
User=ian
User=martin
for x in ${User}
do
print ${#x}
done
This produces the following output;
4
5
3
6 (12 Replies)
Hi,
I am trying to figure out how to get the length of the longest column in the entire file (because the length varies from one row to the other)
I was doing this at first to check how many fields I have for the first row:
awk '{print NF; exit}' file
Now, I can do this:
awk '{ if... (4 Replies)
I am trying to search a given text in a file and find its last occurrence index. The task is to append the searched index in the same file but in a separate column. I am able to accomplish the task partially and looking for a solution.
Following is the detailed description:
names_file.txt
... (17 Replies)
I want to bring values in the second column into single line for uniq value in the first column.
My input
jvm01, Web 2.0 Feature Pack Library
jvm01, IBM WebSphere JAX-RS
jvm01, Custom01 Shared Library
jvm02, Web 2.0 Feature Pack Library
jvm02, IBM WebSphere JAX-RS
jvm03, Web 2.0 Feature... (10 Replies)
Hi All,
I have a file like this(having 2 column).
Column 1: like a,b,c....
Column 2: having numbers.
I want to segregate those numbers based on column 1.
Example:
file.
a 5
b 9
b 620
a 710
b 230
a 330
b 1910 (4 Replies)
Hello experts,
I am trying to unscramble a mixed signal into component signals.
Let the list of known signals be
$ cat tmplist
DU
DU4016
GFF
GFF2010
GFF201019
G2115
G211
DU40 (1 Reply)
Hi,
Let's say I have a pipe-separated input like so:
name_10|A|BCCC|cat_1
name_11|B|DE|cat_2
name_10|A|BC|cat_3
name_11|B|DEEEEEE|cat_4
Using awk, for records with common field 2, I am trying to replace all the shortest substrings by the longest string in field 3.
In order to get the... (5 Replies)