Can anyone help me to count number of occurrence of the strings based on column value. Say i have 300 files with 1000 record length from which i need to count the number of occurrence string which is existing from 213 to 219. Some may be unique and some may be repeated. (8 Replies)
if there's a file containing:
money king money queen money cat money also money king
all those strings are on one line in the file.
how can i find out how many times "money king" shows up in the line?
egrep -c "money king" wont work. (7 Replies)
Hi,
I have the following input in a file & need output as mentioned below(need counter of every occurance of field which is to be increased by 1).
Input:
919143110065
919143110065
919143110052
918648846132
919143110012
918648873782
919143110152
919143110152
919143110152... (2 Replies)
Hi,
let's say an input looks like:
A|C|C|D
A|C|I|E
A|B|I|C
A|T|I|B
as the title of the thread explains, I am trying to get something like:
1|A=4
2|C=2|B=1|T=1
3|I=3|C=1
4|D=1|E=1|C=1|B=1
i.e. a count of every character in each field (first column of output) independently, sorted... (4 Replies)
Hi all,
If i would like to process a file input as below:
col1 col2 col3 ...col100
1 A C E A ...
3 D E G A
5 T T A A
6 D C A G
how can i perform a for loop to count the occurences of letters in each column? (just like uniq -c ) in every column.
on top of that, i would also like... (8 Replies)
Hello,
I have a table that looks like what is shown below:
AA
BB
CC
XY
PQ
RS
AA
BB
CC
XY
RS
I would like the total counts depending on the set they belong to:
if search pattern is in {AA, BB, CC} --> count them as Type1 | wc -l (3 Replies)
Hello all,
I would like to ask your help here:
I've a huge file that has 2 columns. A part of it is:
sorted.txt:
kss23 rml.67lkj
kss23 zhh.6gf
kss23 nhd.09.fdd
kss23 hp.767.88.89
fl67 nmdsfs.56.df.67
fl67 kk.fgf.98.56.n
fl67 bgdgdfg.hjj.879.d
fl66 kl..hfh.76.ghg
fl66... (5 Replies)
Hi All,
let's say an input looks like:
C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11
----------------------------------
1|0123452|C501|Z|Z|Z|E|E|E|E|E|E|E
1|0156123|C501|X|X|X|E|E|E|E|E|E|E
1|0178903|C501|Z|Z|Z|E|E|E|E|E|E|E
1|0127896|C501|Z|Z|Z|E|E|E|E|E|E|E
1|0981678|C501|X|X|X|E|E|E|E|E|E|E
... (6 Replies)
Hello Team,
I need your help on the following:
My input file a.txt is as below:
3330690|373846|108471
3330690|373846|108471
0640829|459725|100001
0640829|459725|100001
3330690|373847|108471
Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are... (4 Replies)
Discussion started by: angshuman
4 Replies
LEARN ABOUT DEBIAN
fastx_quality_stats
FASTX_QUALITY_STATS(1) User Commands FASTX_QUALITY_STATS(1)NAME
fastx_quality_stats - FASTX Statistics
DESCRIPTION
usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE] Part of FASTX Toolkit 0.0.13.2 by A. Gordon (gordon@cshl.edu)
[-h] = This helpful help screen. [-i INFILE] = FASTQ input file. default is STDIN. [-o OUTFILE] = TEXT output file. default is
STDOUT. [-N] = New output format (with more information per nucleotide/cycle).
The *OLD* output TEXT file will have the following fields (one row per column):
column = column number (1 to 36 for a 36-cycles read solexa file)
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
A_Count = Count of 'A' nucleotides found in this column. C_Count = Count of 'C' nucleotides found in this column. G_Count = Count
of 'G' nucleotides found in this column. T_Count = Count of 'T' nucleotides found in this column. N_Count = Count of 'N' nucleo-
tides found in this column. max-count = max. number of bases (in all cycles)
The *NEW* output format:
cycle (previously called 'column') = cycle number max-count For each nucleotide in the cycle (ALL/A/C/G/T/N):
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
SEE ALSO
The quality of this automatically generated manpage might be insufficient. It is suggested to visit
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html
to get a better layout as well as an overview about connected FASTX tools.
fastx_quality_stats 0.0.13.2 May 2012 FASTX_QUALITY_STATS(1)