Does anyone know an easy way to filter this type of file? I want to get everything that has score (column 2) 100.00 and get rid of duplicates (for example gi|332198263|gb|EGK18963.1| below), so I guess uniq can be used for this?
I want to choose these ones.
In the end, I want the output to look like this.. Is it possible to use sed for this?
OUTPUT:
Can anyone please help? Thanks so much in advance!
I have following command which tells me File size in GBs which are greater than 0.01GBs recursively in a dir structure.
ls -l -R | awk '{ if ($5/1073741824 >= 0.01) print $9, $5/1073741824 }'
But there are some files whom I dont have enough permissions, after executing this script
gives me... (1 Reply)
Hi All,
I have the below input and expected ouput. I need a code which can scan through this input file and if the number in column1 is more than 1 , it will print out the whole line, else it will output "No Re-occurrence". Can anybody help ?
Input:
1 vvvvv 20 7 7 23 0 64
6 zzzzzz 11 5... (7 Replies)
file1 contain: (this just a small sample of data it may have thousand of lines)
1 aaa 1/01/1975 delhi
2 bbb 2/03/1977 mumbai
3 ccc 1/01/1975 mumbai
4 ddd 2/03/1977 chennai
5 aaa 1/01/1975 kolkatta
6 bbb 2/03/1977 bangalore
program:
nawk '{
idx= $2 SUBSEP $3
arr = (idx in arr) ?... (2 Replies)
Hi,
I have this scenario; where there are two classes:- apple and orange.
1,2,3,4,5,6,apple
1,1,0,4,2,3,apple
1,3,3,3,3,4,apple
1,1,1,1,1,1,orange
1,2,3,1,1,1,orange
Basically for apple, i have 3 entries in the file, and for orange, I have 2 entries. Im trying to edit the file and find... (5 Replies)
Hello Gurus,
Please help me out of the problem. I ve a input file as below
input clock;
input a; //reset all
input b;
//input comment
output c;
output d;
output e;
input f;
//output comment
I need the output as follows:
\\Inputs (1 Reply)
Hi All,
After Sorting directories and files i have got following output as below, now i only want the strings common in them, so the actual output should be as below in the bottom. How do i do that?
Thanks
-adsi
File to be modified:-
Common Components for ----> AA... (4 Replies)
Hello,
I have a log file that has following output as below.
LAP.sun5 CC
LAP.sun5 CQ
perl.sun5 CC
perl.sun5 CQ
TSLogger.sun5 CC
TSLogger.sun5 CQ
TSLogger.sun5 KR
WAS.sun5 CC
WAS.sun5 MT
WAS.sun5 CQ
I want to output to be in the way below, i tried using awk but could not do it. ... (12 Replies)
Hi,
I have several files that look like this:
File1.txt
Data1
Data2
Data20
File2.txt
Data1
Data5
Data10
File3.txt
Data1
Data2
Data17
File4.txt (6 Replies)
Hi,
I have some data like seen below.
format : apple(hhmm mm/dd).fruit
apple(2345 03/25).fruit
apple(2345 05/06).fruit
orange(0443 05/02).fruit
orange(0345 05/05).fruit
orange(2134 05/04).fruit
grape(0930 04/24).fruit
grape(2330 03/30).fruit
I need to get the data which are... (1 Reply)
FASTX_QUALITY_STATS(1) User Commands FASTX_QUALITY_STATS(1)NAME
fastx_quality_stats - FASTX Statistics
DESCRIPTION
usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE] Part of FASTX Toolkit 0.0.13.2 by A. Gordon (gordon@cshl.edu)
[-h] = This helpful help screen. [-i INFILE] = FASTQ input file. default is STDIN. [-o OUTFILE] = TEXT output file. default is
STDOUT. [-N] = New output format (with more information per nucleotide/cycle).
The *OLD* output TEXT file will have the following fields (one row per column):
column = column number (1 to 36 for a 36-cycles read solexa file)
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
A_Count = Count of 'A' nucleotides found in this column. C_Count = Count of 'C' nucleotides found in this column. G_Count = Count
of 'G' nucleotides found in this column. T_Count = Count of 'T' nucleotides found in this column. N_Count = Count of 'N' nucleo-
tides found in this column. max-count = max. number of bases (in all cycles)
The *NEW* output format:
cycle (previously called 'column') = cycle number max-count For each nucleotide in the cycle (ALL/A/C/G/T/N):
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
SEE ALSO
The quality of this automatically generated manpage might be insufficient. It is suggested to visit
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html
to get a better layout as well as an overview about connected FASTX tools.
fastx_quality_stats 0.0.13.2 May 2012 FASTX_QUALITY_STATS(1)