I was searching the net for a solution for my problem... unfortunately nothing so far.
I want to sort on more than on column tab delimited file and keep the line if in the column I sort there is no value, but for those who have a value I want them only unique.
I have tried the options:
but here I lose the lines that have nothing in the 12th column...
another option:
here it looks like "!seen[$12]++}" do nothing and the output empty.
I want to keep all lines but have the unique once by the 5th column and by the 12th column, meaning the lines that have no value in the 12th column should be kept (keep the line).
More in details:
# My data set:
I want to sort by the 5th and the 12 column and have no duplicates for the two of them.
the 5h - is the method hit number (for example cd/G3D/PF etc) and the 12th - is the interpro hit number (IPR)
so the output should contain unique lines by the 5th column and by the 12th column even if nothing in the 12th, like here:
Thanks for reading until here!
Hope someone will have a solution for that!
of curse I can have a solution in more than one line, but it will be better to have one line solution...
Thanks a lot!
Last edited by ksenia; 02-19-2019 at 10:29 AM..
Reason: too long
The original statement contains a grep which will reduce the lines on the output shown. Not sure how much this helps (I added tabs manually to the post to test):
These 4 Users Gave Thanks to rdrtx1 For This Post:
Amazing, it is doing exactly what I was searching for.
Thanks a lot! --- Post updated at 04:21 PM ---
Can you please give walk through for "(!seen[$12]++ || ! length($12))"?
Thanks
Moderator's Comments:
MOD's comment: Please wrap your samples, codes into [CODE]....[/CODE] tags in all your posts as per forum's rules.
Hello ksenia,
Could you please go through following and let me know if this helps you.
This is only for understanding purposes, I haven't run it to see if this is working with comments or not(fair warning here).
Thanks,
R. Singh
These 2 Users Gave Thanks to RavinderSingh13 For This Post:
I must be missing something very basic here. If the input file has tab separated fields (as stated in post #1 in this thread), then why is any pipeline needed here? Why not just use:
Note that the $(printf '\t') in the above can be replaced by a single literal <tab> character.
This will produce at least one line of output that is not in the output you say you want, which is the line in your sample input file:
which has empty fields for both field #5 and field #12. Since this is a unique value for that pair of fields, it seems to meet your criteria and should be displayed, shouldn't it?
This is untested since the sample input file provided did not contain any <tab>s and I wasn't sure which <space>s should be replaced by <tab>s.
Hi
My Requirement is to take the sum of each column
below is the input file.
1 2 3 4
1 2 3 4
1 2 3 4
Initial i was using below command to achieve my desired result. however this was adding the row and not column.
i am not able understand why this is happening
awk... (1 Reply)
Hello, I currently have managed to get an awk function working inside a for loop that allows me to combine two files based on their headings but what I have not been able to do is print the output to files with variable names.
awk '
NR==FNR {a=$0; next}
/^>/ {$0 = $0" "a;}
... (2 Replies)
Hi All,
I am new to AWK programming. I have the following for loop in my awk program.
cat printhtml.awk:
BEGIN
-------- <some code here>
END{
----------<some code here>
for(N=0; N<H; N++)
{
for(M=5; M<D; M++) print "\t" D "";
}
-----
}
... (2 Replies)
Hi, everyone!
I have a file, when I print its $1 out it show several strings like this:
AABBCC
AEFJKLFG
FALEF
FAIWEHF
What I want to do is that, after output of each record, search the string in all files in the same folder, print out the record and file name.
This is what I want... (4 Replies)
Hello,
I was wondering if it is possible to do a loop on letters rather than numbers with awk (gawk).
Basically I used to do:
echo "nothing" | gawk '{for(i=1;i<11;i++)print i}'
But I would like to do something like that (which obviously does not work):
echo "nothing" | gawk '{for(i in... (6 Replies)
I have two files which I would like to compare and then manipulate in a way.
File1:
pictures.txt 1.1 1.3
dance.txt 1.2 1.4
treehouse.txt 1.3 1.5
File2:
pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244
dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2... (1 Reply)
Hey,
I know this is a stupid question, but it doesn't work.
I have a file with 10 lines and I want to pipe the content to awk and then print line 1 til 2 into another file and then line 3-4 ...
So my script looks like that, but doesn't work:
cat grid_ill.pts | awk '{
for (NR=1;NR<3;NR++)... (8 Replies)
I am new to unix and have pieced together two scripts that work independently.
The first checks all the filesystems and reports which are running low on space.
df -m | awk 'int($4) > 75 {
print $1 " has only " $3 "mb free from a total of " $2 ", this filesystem is
" $4 " full! \n"
}... (1 Reply)
Hello,
I am trying to use AWK to print only the first field of numerous text files, and then overwrite these files. They are of the format 1*2,3,4,5. I have tried the following code (using tcsh):
foreach f (file1 file2 file3)
cat $f | awk -F'*' '{print $1}' > $f
end
However, I get very... (4 Replies)