Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Splitting a file in a directory! Post 302581801 by PerlNutt on Wednesday 14th of December 2011 05:22:08 AM
Old 12-14-2011
Okay so the genome has a load of genes in it, all just characters, and each new gene is represented by > Here's an example of three genes taken from a genome


Code:
>AFL2G_00003 | Aspergillus flavus hypothetical protein similar to branched-chain amino acid aminotransferase pro
tein (translation) (287 aa)
MTSMNKVFSGYYERKARLDNSGNRFAKGIAYVQGSFVRLADARVPLLDEGFMHSDLTYDV
PSVWDGRFFRLDDHLSRLEDSCEKMRLKIPLSRDEVKQTLREMVAKSGIEDAFVELIVTR
GLKGVRGNKPEDLFDNHLYLIVMPYVWVMEPAMQPTGGTAIIARTVRRTPPEGSGFNIVL
VKDGIIYTPDRGVLEGITRKSVFDIAQAKNIEVRVQMVPLEHAYHADEIFMCTTAGGIMP
ITKLDGKPIRNGKVGPLTTKIWDEYWAMHYDPKYSSAIDYKGHEGN*
>AFL2G_00004 | Aspergillus flavus hypothetical protein similar to MSF tranporter (translation) (366 aa)
MANSWIHERFGQRGIALLGTGMHVISYFATTQHPPFPLLITIFILAGLGNGIVDASWNAW
IGAMHNSSQLMGILHAFYGLGAALAPLTATYVITQRGCMWYHFYYIMGIAATIEFVTSVA
AFWSARGSLVEASELGVPGDNVQQDDRDSSRRNTTLKNPTLESLGLVSTWIISLFLLVYV
GIEVTVGGWVFTFLVDLRNTPPSVAGVVTFMYWGGLTVGRVCLGFITPYFKRQRLVIVVY
LLACVVCHIGFWLATELHLSMIAVTLLGFFLGPLYPEAVIAQAALLPKHLHVAAVGFACA
LGSAGGCIFPFITGAIAKAHGIKVLHPVVLAMLMLCLILWFALPGQRRGTKEAASPAWSS
SPTRS*
>AFL2G_00005 | Aspergillus flavus hypothetical protein similar to class III aminotransferase (translation) (466 
aa)
MGSAAEPAYLYKNVTHDPTVPSVKSAEGIYIFLENGQKILDATSGAAVSAIGHGVGRVKK
AIMSQLDQVEYCHPGFFPNTPAMDLADLLVESTGGKLSRACILGSGSEAVEAAMKLAYQY
FEEQSPNTRRTRFISRHGSWHGCTLGALALGDFKPRKTRFNSILTSNISHVSACDPYHGL
MENEDPETYVARLKDELDNEFQRLGPETVCAVFLEPMVGTALGCVTALPGYLQAVRDVCD
RYGALLVFDEIMCGMGRTGITHAWQEDGVAPDIELVGKGLAAGYGTISGLLVNDRVLDGL
RHGGGYFVHGQTYQSHPLGCAAAVEVQRIIKEENLVENCRKMGQYLGQQLKLHLGDHPYV
GDIRGRGLFWAVEFMADPPTKTPFSPAFTISKRMQSRGMERGYDICLFAATGAVDGCNGD
HVLLAPPYIVHKEDVDEIMDETGIMSVKQAIPTHGSSQKGGMNLQ*

So what I want to do is split it up whenever there's a > and give each gene a number. That way I can reassemble them with replacement and see which ones are going in and which are not. I'm basically splitting up a text fie to redo it, and I think starting with the split and then going on to number them would be the easiest way to do it!

Moderator's Comments:
Mod Comment Video tutorial on how to use code tags in The UNIX and Linux Forums.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[Splitting file] Extracting group of segments from one file to others

Hi there, I need to split one huge file into separate files if the condition is fulfilled according to that the position between 97 and 98 matches with “IT” at the segment MAS. There is no delimiter file is fix-width with varous line length. Could you please help me how I do split the file... (1 Reply)
Discussion started by: ozgurgul
1 Replies

2. UNIX for Dummies Questions & Answers

Splitting files into a specific directory

Hello, I am trying to do the following; bzcat data.in.bz2 | split -l 1000000 -d this work great, except that once the files have been split, they are not in the directory I want them to be in. So I then have to move them, at times this can get hairy. Is there anyway to specify where the... (4 Replies)
Discussion started by: amcrisan
4 Replies

3. Shell Programming and Scripting

File splitting, naming file according to internal field

Hi All, I have a rather stange set of requirements that I'm hoping someone here could help me with. We receive a file that is actually a concatenation of 4 files (don't believe this would change, but ideally the solution would handle n files). The super-file looks like:... (7 Replies)
Discussion started by: Leedor
7 Replies

4. Shell Programming and Scripting

Splitting a file in to multiple files and passing each individual file to a command

I have an input file with contents like: MainFile.dat: 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 ... (4 Replies)
Discussion started by: rkrish
4 Replies

5. Shell Programming and Scripting

Splitting Files with awk into other directory

I am trying to split into different files using awk: cat files | gawk '$1 ~ /---/ || $1 ~ /^deleting$/ || $1 ~ /^sorting$/ || $1 ~ /==/ {print}'| gawk '$1 ~ /---/ || $1 ~ /^deleting$/ || $1 ~ /^sorting$/ || $1 ~ /==/ {print}' |gawk '//{x="F"++i;}{print > x;}' What I am trying to do is make F*... (3 Replies)
Discussion started by: newbie2010
3 Replies

6. UNIX for Dummies Questions & Answers

Extracting data from one file, based on another file (splitting)

Dear All, I have two files but want to extract data from one based on another... can you please help me file 1 David Tom Ellen and file 2 David|0010|testnamez|resultsz David|0004|testnamex|resultsx Tom|0010|testnamez|resultsz Tom|0004|testnamex|resultsx Ellen|0010|testnamez|resultsz... (12 Replies)
Discussion started by: A-V
12 Replies

7. Shell Programming and Scripting

Splitting XML file on basis of line number into multiple file

Hi All, I have more than half million lines of XML file , wanted to split in four files in a such a way that top 7 lines should be present in each file on top and bottom line of should be present in each file at bottom. from the 8th line actual record starts and each record contains 15 lines... (14 Replies)
Discussion started by: ajju
14 Replies

8. Shell Programming and Scripting

Execution of loop :Splitting a single file into multiple .dat file

hdr=$(cut -c1 $path$file|head -1)#extract header”H” trl=$(cut -c|path$file|tail -1)#extract trailer “T” SplitFile=$(cut -c 50-250 $path 1$newfile |sed'$/ *$//' head -1')# to trim white space and extract table name If; then # start loop if it is a header While read I #read file Do... (4 Replies)
Discussion started by: SwagatikaP1
4 Replies

9. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

10. UNIX for Beginners Questions & Answers

Splitting the file based on two fields - Fixed length file

Hi , I am having a scenario where I need to split the file based on two field values. The file is a fixed length file. ex: AA0998703000000000000190510095350019500010005101980301 K 0998703000000000000190510095351019500020005101480 ... (4 Replies)
Discussion started by: saj
4 Replies
AutoSplit(3pm)						 Perl Programmers Reference Guide					    AutoSplit(3pm)

NAME
AutoSplit - split a package for autoloading SYNOPSIS
autosplit($file, $dir, $keep, $check, $modtime); autosplit_lib_modules(@modules); DESCRIPTION
This function will split up your program into files that the AutoLoader module can handle. It is used by both the standard perl libraries and by the MakeMaker utility, to automatically configure libraries for autoloading. The "autosplit" interface splits the specified file into a hierarchy rooted at the directory $dir. It creates directories as needed to reflect class hierarchy, and creates the file autosplit.ix. This file acts as both forward declaration of all package routines, and as timestamp for the last update of the hierarchy. The remaining three arguments to "autosplit" govern other options to the autosplitter. $keep If the third argument, $keep, is false, then any pre-existing "*.al" files in the autoload directory are removed if they are no longer part of the module (obsoleted functions). $keep defaults to 0. $check The fourth argument, $check, instructs "autosplit" to check the module currently being split to ensure that it includes a "use" specifi- cation for the AutoLoader module, and skips the module if AutoLoader is not detected. $check defaults to 1. $modtime Lastly, the $modtime argument specifies that "autosplit" is to check the modification time of the module against that of the "autosplit.ix" file, and only split the module if it is newer. $modtime defaults to 1. Typical use of AutoSplit in the perl MakeMaker utility is via the command-line with: perl -e 'use AutoSplit; autosplit($ARGV[0], $ARGV[1], 0, 1, 1)' Defined as a Make macro, it is invoked with file and directory arguments; "autosplit" will split the specified file into the specified directory and delete obsolete ".al" files, after checking first that the module does use the AutoLoader, and ensuring that the module is not already currently split in its current form (the modtime test). The "autosplit_lib_modules" form is used in the building of perl. It takes as input a list of files (modules) that are assumed to reside in a directory lib relative to the current directory. Each file is sent to the autosplitter one at a time, to be split into the directory lib/auto. In both usages of the autosplitter, only subroutines defined following the perl __END__ token are split out into separate files. Some rou- tines may be placed prior to this marker to force their immediate loading and parsing. Multiple packages As of version 1.01 of the AutoSplit module it is possible to have multiple packages within a single file. Both of the following cases are supported: package NAME; __END__ sub AAA { ... } package NAME::option1; sub BBB { ... } package NAME::option2; sub BBB { ... } package NAME; __END__ sub AAA { ... } sub NAME::option1::BBB { ... } sub NAME::option2::BBB { ... } DIAGNOSTICS
"AutoSplit" will inform the user if it is necessary to create the top-level directory specified in the invocation. It is preferred that the script or installation process that invokes "AutoSplit" have created the full directory path ahead of time. This warning may indicate that the module is being split into an incorrect path. "AutoSplit" will warn the user of all subroutines whose name causes potential file naming conflicts on machines with drastically limited (8 characters or less) file name length. Since the subroutine name is used as the file name, these warnings can aid in portability to such systems. Warnings are issued and the file skipped if "AutoSplit" cannot locate either the __END__ marker or a "package Name;"-style specification. "AutoSplit" will also emit general diagnostics for inability to create directories or files. perl v5.8.0 2002-06-01 AutoSplit(3pm)
All times are GMT -4. The time now is 09:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy