Sponsored Content
Top Forums Shell Programming and Scripting Help with reformat data structure Post 302696983 by perl_beginner on Thursday 6th of September 2012 01:41:11 AM
Old 09-06-2012
Help with reformat data structure

Input file:
Code:
bv|111259484|pir||T49736_real_data
bv|159484|pir||T9736_data_figure
bv|113584|prf|T4736|truth
bv|113584|pir||T4736_truth

Desired output:
Code:
bv|111259484|pir|T49736|real_data
bv|159484|pir|T9736|data_figure
bv|113584|prf|T4736|truth
bv|113584|pir|T4736|truth

Once the program find "pir||"
I hope to replace "pir||" into "pir|' and follow by replace the next shown "_" into "|"
Command I try:
Code:
awk '{gsub(/pir||/,"pir|",$1);print}' input_file.txt

I just able to replace the "pir||" into "pir|' but I don't know how to replace the following "_" into "|" Smilie
Thanks for any advice.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

help reformat data with awk

I am trying to write an awk program to reformat a data table and convert the date to julian time. I have all the individual steps working, but I am having some issues joing them into one program. Can anyone help me out? Here is my code so far: # This is an awk program to convert the dates from... (4 Replies)
Discussion started by: climbak
4 Replies

2. Shell Programming and Scripting

Reformat Data (Perl)

I am new to Perl. I need to reformat a data file as the last part of a script I am working on. I am stuck on this. Here is the current format: CUSTOMER Filename 09/04/07-08:49 CUSTOMER Filename 09/04/07-08:52 CUSTOMER Filename 09/04/07-08:52 CUSTOMER2 Filename 09/04/07-08:49 CUSTOMER2... (3 Replies)
Discussion started by: flood
3 Replies

3. Shell Programming and Scripting

reformat data with a shell script

Can anyone help me with a shell script that can do the following: I have a data in fasta format (first line is the header, followed by a sequence of characters). >ALLLY GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTC GAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGC... (5 Replies)
Discussion started by: manishabh
5 Replies

4. Shell Programming and Scripting

Reformat the data of a file.

I have a file which have data like A.txt a 1Jan I am in a1. 1Jan I was born. 2Jan I am here. 3Jan I am in a3. b 1Jan I am in b1. c 2Jan I am in c2. d 2Jan I am in d2. 5jan I am in d5. date in the file might be vary evertime. (9 Replies)
Discussion started by: samkhu
9 Replies

5. Shell Programming and Scripting

Split, Search and Reformat by Data Group

Hi, I am writing just to share my appreciation for help I have received from this site in the past. In a previous post Split File by Data Group I received a lot of help with a troublesome awk script to reformat some complicated data blocks. What I learned really came in hand recently when I... (1 Reply)
Discussion started by: mkastin
1 Replies

6. Shell Programming and Scripting

Help with reformat data content

input file: hsa-miR-4726-5p Score hsa-miR-483-5p Score hsa-miR-125b-2* Score hsa-miR-4492 hsa-miR-4508 hsa-miR-4486 Score Desired output file: hsa-miR-4726-5p Score hsa-miR-483-5p Score hsa-miR-125b-2* Score hsa-miR-4492 hsa-miR-4508 hsa-miR-4486 Score ... (6 Replies)
Discussion started by: perl_beginner
6 Replies

7. Shell Programming and Scripting

Help with reformat input data

Input file: 58227131 50087390 57339526 40578034 65348841 55614853 64363217 44178559 Desired output file: 58227131 50087390 57339526 40578034 65348841 55614853 64363217 44178559 Command that I try: (4 Replies)
Discussion started by: perl_beginner
4 Replies

8. Shell Programming and Scripting

Reformat MLS Data - Use AWK?

I am helping my wife set up a real estate site and I am starting to integrate MLS listings. We are using a HostGator level 5 VPS running CentOS and have full root and SSH access to the VPS. Thus far I have automated the daily FTP download of listings from our MLS server using a little sh script.... (4 Replies)
Discussion started by: Chicago_Realtor
4 Replies

9. Shell Programming and Scripting

Data reformat and rearrangement problem asking

Input file: dependent general_process dependent general_process regulation general_process - - template component food component binding data_rearrangement binding data_rearrangement specific_activity data_rearrangement - ... (7 Replies)
Discussion started by: cpp_beginner
7 Replies

10. Shell Programming and Scripting

Help with reformat data set

Input file 4CL1 O24145 CoA1 4CL1 P31684 CoA1 4CL1 Q54P77 CoA_1 73 O36421 Unknown 4CL3 Q9S777 coumarate 4CL3 Q54P79 coumarate 4CL3 QP7932 coumarate Desired output result 4CL1 O24145#P31684 CoA1 4CL1 Q54P77 CoA_1 73 O36421 Unknown 4CL3 Q9S777#Q54P79#QP7932 coumarate I... (5 Replies)
Discussion started by: perl_beginner
5 Replies
sindex(1)							  Biosquid Manual							 sindex(1)

NAME
sindex - index a sequence database for sfetch SYNOPSIS
sindex [options] seqfile1 [seqfile2...] DESCRIPTION
sindex indexes one or more seqfiles for future sequence retrievals by sfetch. An SSI ("squid sequence index") file is created in the same directory with the sequence files. By default, this file is called <seqfile>.ssi. If there is more than one sequence file on the command line, the SSI filename will be constructed from the last sequence file name. This may not be what you want; see the -o option to specify your own name for the SSI file. sindex is capable of indexing large files (>2 GB) if optional LFS support has been enabled at compile-time. See INSTALL instructions that came with @PACKAGE@. OPTIONS
-h Print brief help; includes version number and summary of all options, including expert options. -o <ssi outfile> Direct the SSI index to a file named <outfile>. By default, the SSI file would go to <seqfile>.ssi. EXPERT OPTIONS
--64 Force the SSI file into 64-bit (large seqfile) mode, even if the seqfile is small. You don't want to do this unless you're debug- ging. --external Force sindex to do its record sorting by external (on-disk) sorting. This is only useful for debugging, too. --informat <s> Specify that the sequence file is definitely in format <s>; blocks sequence file format autodetection. This is useful in automated pipelines, because it improves robustness (autodetection can occasionally go wrong on a perversely misformed file). Common examples include genbank, embl, gcg, pir, stockholm, clustal, msf, or phylip; see the printed documentation for a complete list of accepted format names. --pfamseq A hack for Pfam; indexes a FASTA file that is known to have identifier lines in format ">[name] [accession] [optional description]". Normally only the sequence name would be indexed as a primary key in a FASTA SSI file, but this allows indexing both the name (as a primary key) and accession (as a secondary key). SEE ALSO
afetch(1), alistat(1), compalign(1), compstruct(1), revcomp(1), seqsplit(1), seqstat(1), sfetch(1), shuffle(1), sreformat(1), strans- late(1), weight(1). AUTHOR
Biosquid and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution for more details, or contact me. Sean Eddy HHMI/Department of Genetics Washington University School of Medicine 4444 Forest Park Blvd., Box 8510 St Louis, MO 63108 USA Phone: 1-314-362-7666 FAX : 1-314-362-2157 Email: eddy@genetics.wustl.edu Biosquid 1.9g January 2003 sindex(1)
All times are GMT -4. The time now is 04:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy