07-07-2010
Removing columns with dashes
My files look like this
Quote:
>GHL8OVD01BNNCA Freq 4
TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGT-GCAGCA-TA
>GHL8OVD01CMQVT Freq 15
TTGATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGT-GCAGCA-TA
>GHL8OVD01CMQVT Freq 50
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACCTTCGC-TA
>GHL8OVD01CMQVW Freq 700
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCA-TACCAG-AC
>GHL8OVD01A45V3 Freq 9
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTA-AGGACCTTC
>GHL8OVD01AV2U9 Freq 17
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGT-GCAGCG-TA
I need to remove the columns where dashes are the majority, if any of the sequences has any character in that particular position it should be removed too. The IDs and Freqs should be kept intact. Thus, the resulting file should look like this
Quote:
>GHL8OVD01BNNCA Freq 4
TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTGCAGCATA
>GHL8OVD01CMQVT Freq 15
TTGATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGTGCAGCATA
>GHL8OVD01CMQVT Freq 50
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACTTCGCTA
>GHL8OVD01CMQVW Freq 700
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCATACCAGAC
>GHL8OVD01A45V3 Freq 9
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTAAGGACCTC
>GHL8OVD01AV2U9 Freq 17
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGTGCAGCGTA
Thanks in advance
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have a file containing social security numbers with the format ###-##-####. I need to read each record in this file, reformat the SSN to the format #########, and write the record with the reformatted SSN to a new file. I am a UNIX newbie. I think I need to use either the sed or awk commands, but... (2 Replies)
Discussion started by: Marcia P
2 Replies
2. Shell Programming and Scripting
Hello Friends,
Can any one help me with this issue:
I would like to format a file:
say if I have rows like:
4512 , SMITH , I-28984 ,, 4324 , 4343
42312 , SMITH , I-2EE8984 ,, 432E4E4 , 4343
I would like to have the output diaplayed like :
4512... (8 Replies)
Discussion started by: sbasetty
8 Replies
3. UNIX for Dummies Questions & Answers
I have a file which looks like
AA BB CC DD EE FF GG HH KK
AA BB GG HH KK FF CC DD EE
AA BB CC DD EE UU VV XX ZZ
AA BB VV XX ZZ UU CC DD EE
....
I want the script to give me only one line based on duplicate contents:
AA BB CC DD EE FF GG HH KK
AA BB CC DD EE UU VV XX ZZ (7 Replies)
Discussion started by: adsforall
7 Replies
4. UNIX for Dummies Questions & Answers
Hi Experts,
I have a file which looks like in this way
1 2233|A.K Shukla |G.M |Sales |12/12/52|6000
2 9876|Jai Sharma |Director |Production |12/03/50|67000
3 5678|Sumit Chakarborty |D.G.M |Marketing |19/04/43|6000
4 2365|Barun... (2 Replies)
Discussion started by: DilipPanda
2 Replies
5. Shell Programming and Scripting
Hi Guys...
Please Could you help me with the following ?
aaaa bbbb cccc sdsd
aaaa bbbb cccc qwer
as you can see, the 2 lines are matched in three fields...
how can I delete this pupicate ? I mean to delete the second one if 3 fields were duplicated ?
Thanks (14 Replies)
Discussion started by: yahyaaa
14 Replies
6. Shell Programming and Scripting
Hi,
I have a sample file as shown below --
id parent name dba
-----------------------------------... (7 Replies)
Discussion started by: sumirmehta
7 Replies
7. Shell Programming and Scripting
I have a one-line command,
lsusb | awk '{ $1=""; $2=""; $3=""; $4=""; $5=""; $6=""; print $0 }'
It works, and gives the results I expect, I was just wondering if I am missing some easier way to nullify the first 6 column variables?
Something like,
lsusb | awk '{ $(1-6)=""; print $0 }'
But... (10 Replies)
Discussion started by: AlphaLexman
10 Replies
8. Shell Programming and Scripting
HI ,
I want to remove 5th and 6th column from a .csv file using awk.is there any way of this apart from writing the each field as below
awk -F, '{print $1,$2,$3,$7......$100}' OFS=, infile.
Thx,
Deepti (4 Replies)
Discussion started by: gaur.deepti
4 Replies
9. Shell Programming and Scripting
HI ,
I have a comma delimiter file, in which I want to remove 8th and 9th column.
I tried removing those columns using the below code
awk 'BEGIN { FS=","; OFS="," } {$8=$9="";gsub(",+",",",$0)}1' infile
But the problem is 8th and 9th columns are user entered fields, theyvhave carriage... (1 Reply)
Discussion started by: mora
1 Replies
10. UNIX for Dummies Questions & Answers
I have a text file that has three columns. But at the end of the text file, there are trailing lines that have missing second and third columns:
4 0.04972604 KLHL28
4 0.0497332 CSTB
4 0.04979822 AIF1
4 0.04983331 DECR2
4 0.04990344 KATNB1
4
4
4
4
How can I remove the trailing... (3 Replies)
Discussion started by: evelibertine
3 Replies
LEARN ABOUT REDHAT
sggglm
SGGGLM(l) ) SGGGLM(l)
NAME
SGGGLM - solve a general Gauss-Markov linear model (GLM) problem
SYNOPSIS
SUBROUTINE SGGGLM( N, M, P, A, LDA, B, LDB, D, X, Y, WORK, LWORK, INFO )
INTEGER INFO, LDA, LDB, LWORK, M, N, P
REAL A( LDA, * ), B( LDB, * ), D( * ), WORK( * ), X( * ), Y( * )
PURPOSE
SGGGLM solves a general Gauss-Markov linear model (GLM) problem:
minimize || y ||_2 subject to d = A*x + B*y
x
where A is an N-by-M matrix, B is an N-by-P matrix, and d is a given N-vector. It is assumed that M <= N <= M+P, and
rank(A) = M and rank( A B ) = N.
Under these assumptions, the constrained equation is always consistent, and there is a unique solution x and a minimal 2-norm solution y,
which is obtained using a generalized QR factorization of A and B.
In particular, if matrix B is square nonsingular, then the problem GLM is equivalent to the following weighted linear least squares problem
minimize || inv(B)*(d-A*x) ||_2
x
where inv(B) denotes the inverse of B.
ARGUMENTS
N (input) INTEGER
The number of rows of the matrices A and B. N >= 0.
M (input) INTEGER
The number of columns of the matrix A. 0 <= M <= N.
P (input) INTEGER
The number of columns of the matrix B. P >= N-M.
A (input/output) REAL array, dimension (LDA,M)
On entry, the N-by-M matrix A. On exit, A is destroyed.
LDA (input) INTEGER
The leading dimension of the array A. LDA >= max(1,N).
B (input/output) REAL array, dimension (LDB,P)
On entry, the N-by-P matrix B. On exit, B is destroyed.
LDB (input) INTEGER
The leading dimension of the array B. LDB >= max(1,N).
D (input/output) REAL array, dimension (N)
On entry, D is the left hand side of the GLM equation. On exit, D is destroyed.
X (output) REAL array, dimension (M)
Y (output) REAL array, dimension (P) On exit, X and Y are the solutions of the GLM problem.
WORK (workspace/output) REAL array, dimension (LWORK)
On exit, if INFO = 0, WORK(1) returns the optimal LWORK.
LWORK (input) INTEGER
The dimension of the array WORK. LWORK >= max(1,N+M+P). For optimum performance, LWORK >= M+min(N,P)+max(N,P)*NB, where NB is an
upper bound for the optimal blocksizes for SGEQRF, SGERQF, SORMQR and SORMRQ.
If LWORK = -1, then a workspace query is assumed; the routine only calculates the optimal size of the WORK array, returns this
value as the first entry of the WORK array, and no error message related to LWORK is issued by XERBLA.
INFO (output) INTEGER
= 0: successful exit.
< 0: if INFO = -i, the i-th argument had an illegal value.
LAPACK version 3.0 15 June 2000 SGGGLM(l)