Splitting a file in a directory!


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Splitting a file in a directory!
# 1  
Old 12-14-2011
Error Splitting a file in a directory!

Hey guys!


So I have a directory with 82 genomes in it and I have to split them up, one by one, into genes. I tried using the split function, like

Code:
#!/usr/bin/perl

  use strict;
  use warnings;

  my $data = 'location';

  my @values = split('>', $data);

  foreach my $val (@values) {
    print "$val\n";
  }

  exit;

At first I was using it from my home directory and specifying the location, then I was using it in the directory, using just the file name. I keep returning errors though. Does anyone know how to fix it? And should I use files with the .faa extension or get rid of it and just use the file name?

Thanks Smilie

Last edited by radoulov; 12-14-2011 at 06:23 AM.. Reason: Code tags!
# 2  
Old 12-14-2011
I am afraid .. genomes ??? not understandable .. Pls be more clear and also post some input and expected output contents ..
# 3  
Old 12-14-2011
Okay so the genome has a load of genes in it, all just characters, and each new gene is represented by > Here's an example of three genes taken from a genome


Code:
>AFL2G_00003 | Aspergillus flavus hypothetical protein similar to branched-chain amino acid aminotransferase pro
tein (translation) (287 aa)
MTSMNKVFSGYYERKARLDNSGNRFAKGIAYVQGSFVRLADARVPLLDEGFMHSDLTYDV
PSVWDGRFFRLDDHLSRLEDSCEKMRLKIPLSRDEVKQTLREMVAKSGIEDAFVELIVTR
GLKGVRGNKPEDLFDNHLYLIVMPYVWVMEPAMQPTGGTAIIARTVRRTPPEGSGFNIVL
VKDGIIYTPDRGVLEGITRKSVFDIAQAKNIEVRVQMVPLEHAYHADEIFMCTTAGGIMP
ITKLDGKPIRNGKVGPLTTKIWDEYWAMHYDPKYSSAIDYKGHEGN*
>AFL2G_00004 | Aspergillus flavus hypothetical protein similar to MSF tranporter (translation) (366 aa)
MANSWIHERFGQRGIALLGTGMHVISYFATTQHPPFPLLITIFILAGLGNGIVDASWNAW
IGAMHNSSQLMGILHAFYGLGAALAPLTATYVITQRGCMWYHFYYIMGIAATIEFVTSVA
AFWSARGSLVEASELGVPGDNVQQDDRDSSRRNTTLKNPTLESLGLVSTWIISLFLLVYV
GIEVTVGGWVFTFLVDLRNTPPSVAGVVTFMYWGGLTVGRVCLGFITPYFKRQRLVIVVY
LLACVVCHIGFWLATELHLSMIAVTLLGFFLGPLYPEAVIAQAALLPKHLHVAAVGFACA
LGSAGGCIFPFITGAIAKAHGIKVLHPVVLAMLMLCLILWFALPGQRRGTKEAASPAWSS
SPTRS*
>AFL2G_00005 | Aspergillus flavus hypothetical protein similar to class III aminotransferase (translation) (466 
aa)
MGSAAEPAYLYKNVTHDPTVPSVKSAEGIYIFLENGQKILDATSGAAVSAIGHGVGRVKK
AIMSQLDQVEYCHPGFFPNTPAMDLADLLVESTGGKLSRACILGSGSEAVEAAMKLAYQY
FEEQSPNTRRTRFISRHGSWHGCTLGALALGDFKPRKTRFNSILTSNISHVSACDPYHGL
MENEDPETYVARLKDELDNEFQRLGPETVCAVFLEPMVGTALGCVTALPGYLQAVRDVCD
RYGALLVFDEIMCGMGRTGITHAWQEDGVAPDIELVGKGLAAGYGTISGLLVNDRVLDGL
RHGGGYFVHGQTYQSHPLGCAAAVEVQRIIKEENLVENCRKMGQYLGQQLKLHLGDHPYV
GDIRGRGLFWAVEFMADPPTKTPFSPAFTISKRMQSRGMERGYDICLFAATGAVDGCNGD
HVLLAPPYIVHKEDVDEIMDETGIMSVKQAIPTHGSSQKGGMNLQ*

So what I want to do is split it up whenever there's a > and give each gene a number. That way I can reassemble them with replacement and see which ones are going in and which are not. I'm basically splitting up a text fie to redo it, and I think starting with the split and then going on to number them would be the easiest way to do it!

Moderator's Comments:
Mod Comment Video tutorial on how to use code tags in The UNIX and Linux Forums.
# 4  
Old 12-14-2011
what does genomes and genes refer too??
# 5  
Old 12-14-2011
Genomes is pretty much a text file that contains genes. There's an unknown number of genes in the text file, and they're all separated by ">" I want to separate all the genes so I have, say, 1000 small text files, each called a unique number. I tried using a basic split function but it doesn't appear to be working so any help anyone can give would be greatly appreciated!
# 6  
Old 12-14-2011
Do you want the files named 'AFL2G_00003.txt', or just a unique (for that run of the script) number?
# 7  
Old 12-14-2011
Just a unique number, 1-1000 or something like that.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting the file based on two fields - Fixed length file

Hi , I am having a scenario where I need to split the file based on two field values. The file is a fixed length file. ex: AA0998703000000000000190510095350019500010005101980301 K 0998703000000000000190510095351019500020005101480 ... (4 Replies)
Discussion started by: saj
4 Replies

2. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

3. Shell Programming and Scripting

Execution of loop :Splitting a single file into multiple .dat file

hdr=$(cut -c1 $path$file|head -1)#extract header”H” trl=$(cut -c|path$file|tail -1)#extract trailer “T” SplitFile=$(cut -c 50-250 $path 1$newfile |sed'$/ *$//' head -1')# to trim white space and extract table name If; then # start loop if it is a header While read I #read file Do... (4 Replies)
Discussion started by: SwagatikaP1
4 Replies

4. Shell Programming and Scripting

Splitting XML file on basis of line number into multiple file

Hi All, I have more than half million lines of XML file , wanted to split in four files in a such a way that top 7 lines should be present in each file on top and bottom line of should be present in each file at bottom. from the 8th line actual record starts and each record contains 15 lines... (14 Replies)
Discussion started by: ajju
14 Replies

5. UNIX for Dummies Questions & Answers

Extracting data from one file, based on another file (splitting)

Dear All, I have two files but want to extract data from one based on another... can you please help me file 1 David Tom Ellen and file 2 David|0010|testnamez|resultsz David|0004|testnamex|resultsx Tom|0010|testnamez|resultsz Tom|0004|testnamex|resultsx Ellen|0010|testnamez|resultsz... (12 Replies)
Discussion started by: A-V
12 Replies

6. Shell Programming and Scripting

Splitting Files with awk into other directory

I am trying to split into different files using awk: cat files | gawk '$1 ~ /---/ || $1 ~ /^deleting$/ || $1 ~ /^sorting$/ || $1 ~ /==/ {print}'| gawk '$1 ~ /---/ || $1 ~ /^deleting$/ || $1 ~ /^sorting$/ || $1 ~ /==/ {print}' |gawk '//{x="F"++i;}{print > x;}' What I am trying to do is make F*... (3 Replies)
Discussion started by: newbie2010
3 Replies

7. Shell Programming and Scripting

Splitting a file in to multiple files and passing each individual file to a command

I have an input file with contents like: MainFile.dat: 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 ... (4 Replies)
Discussion started by: rkrish
4 Replies

8. Shell Programming and Scripting

File splitting, naming file according to internal field

Hi All, I have a rather stange set of requirements that I'm hoping someone here could help me with. We receive a file that is actually a concatenation of 4 files (don't believe this would change, but ideally the solution would handle n files). The super-file looks like:... (7 Replies)
Discussion started by: Leedor
7 Replies

9. UNIX for Dummies Questions & Answers

Splitting files into a specific directory

Hello, I am trying to do the following; bzcat data.in.bz2 | split -l 1000000 -d this work great, except that once the files have been split, they are not in the directory I want them to be in. So I then have to move them, at times this can get hairy. Is there anyway to specify where the... (4 Replies)
Discussion started by: amcrisan
4 Replies

10. Shell Programming and Scripting

[Splitting file] Extracting group of segments from one file to others

Hi there, I need to split one huge file into separate files if the condition is fulfilled according to that the position between 97 and 98 matches with “IT” at the segment MAS. There is no delimiter file is fix-width with varous line length. Could you please help me how I do split the file... (1 Reply)
Discussion started by: ozgurgul
1 Replies
Login or Register to Ask a Question