This worked perfectly with the \t introduced in the new code ...... yeah my fields are separated by tabs .... awesome stuff Bartus11 ... once again if you clould please break the code down and comment on it, will make things simpler for me to understand and apply elsewhere this code has a special <=> operator .... what does this do ??
After giving it a little thought, I made a little optimisation to that code:
I replaced hash (associative array) with regular array (@h), as it is enough for this task. $h[$.]=$_load each line into array @h, indexed by line number printprint whatever sort function returns sort{...}@hsort @h array (you have to read about sorting arrays in Perl, as it is too extensive subject for short post) split "[\t ]+",$asplit first compare pair element, using multiple TABs and spaces as field separator (split "[\t ]+",$b)[8]take 9th field from array output by that split (split ";",(split "[\t ]+",$b)[8])[3]split that 9th filed using ";" as separator and take 3rd field from resulting array (split "=",(split ";",(split "[\t ]+",$b)[8])[2])[1]split that field (now it contains something like: coverage=43) using "=" as separator, and take 2nd field, so basically this whole line cuts value of "coverage" from the line.
The same happens with second compare pair element ($b): (split "=",(split ";",(split "[\t ]+",$b)[8])[2])[1]
When both values have been extracted, the comparison itself can take place, by the means of "<=>" operator. You can read about that operator in "Learning Perl". Basically it is mostly useful inside of "sort" function, to sort the array numerically.
@Bartus11
Hi hope you are well.
Can I ask you another question continuing from with same data set for which you have kindly provided other answers
Last time you helped me sort the data accoring to any ;delimited parameter of choice in the last field of each line. This time I want to know details about the genotype parameter in this feild.
So taking the above example what I want is to know is the count of each type of genotype. So my expected output for the above would be:
Genotype is always denoted by a capital A-Z letter, so I reckon the regex can be restricted to that pattern.
This uses most of the code from last solution:
Cascade splits are used to get genotype value. Then it is used to populate hash %h (red parts) with genotype type as the keys and number of occurrences as values. At the "END" section, contents of %h hash is printed.
I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with:
wc -l *|sort
15263 Image.txt
16401 reference.txt
40459 richtexteditor.txt
How can I also print the number of unique lines in each file?
15263 1401 Image.txt
16401... (15 Replies)
I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
file 1
Sun Mar 17 00:01:33 2013 submit , Name="1234"
Sun Mar 17 00:01:33 2013 submit , Name="1344"
Sun Mar 17 00:01:33 2013 submit , Name="1124"
..
..
..
..
Sun Mar 17 00:01:33 2013 submit , Name="8901"
file 2
Sun Mar 17 00:02:47 2013 1234 execute SUCCEEDED
Sun Mar 17... (24 Replies)
Hello everyone,
Maybe somebody could help me with an awk script.
I have this input (field separator is comma ","):
547894982,M|N|J,U|Q|P,98,101,0,1,1
234900027,M|N|J,U|Q|P,98,101,0,1,1
234900023,M|N|J,U|Q|P,98,54,3,1,1
234900028,M|H|J,S|Q|P,98,101,0,1,1
234900030,M|N|J,U|F|P,98,101,0,1,1... (2 Replies)
hi
my problem is little complicated one. i have 2 files which appear like this
file 1
abbsss:aa:22:34:as akl abc 1234
mkilll:as:ss:23:qs asc abc 0987
mlopii:cd:wq:24:as asd abc 7866
file2
lkoaa:as:24:32:sa alk abc 3245
lkmo:as:34:43:qs qsa abc 0987
kloia:ds:45:56:sa acq abc 7805
i... (5 Replies)
hi
i have used comm -13 <(sort 1.txt) <(sort 2.txt) option to get the unique lines that are present in file 2 but not in file 1. but some how i am getting the entire file 2. i would expect few but not all uncommon lines fro my dat. is there anything wrong with the way i used the command?
my... (1 Reply)
Hi friends,
I have multiple files. For now, let's say I have two of the following style
cat 1.txt
cat 2.txt
output.txt
Please note that my files are not sorted and in the output file I need another extra column that says the file from which it is coming. I have more than 100... (19 Replies)
Hi All,
I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space.
I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Hello,
I have a bash shell script and I use awk to print certain columns of one file and direct the output to another file. If I do a less or cat on the file it looks correct, but if I email the file and open it with Outlook the lines outputted by awk are concatenated.
Here is my awk line:... (6 Replies)