Split a huge data into few different files?! Post: 302366659

Sponsored Content

Top Forums Shell Programming and Scripting Split a huge data into few different files?! Post 302366659 by patrick87 on Friday 30th of October 2009 04:49:39 AM

10-30-2009

Registered User

Split a huge data into few different files?!

Input file data contents:

Code:

>seq_1
MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA
>seq_2
AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE
>seq_3
ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM
ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA
>seq_4
TTLPPAPVSPTTTTQAEDAAAAATLASQRAKLKASSRISAPANILLGASGADGVKSPLWS
EKERVVERRSPSPSGRNVERPKSTGSTGEPAQPNNSHAGMNLSQSTGPPSASFLRSPAPD
>seq_5
FDSQLSPIVGGNWASMVNTPLMPMFGSKGGGEGGSFGGLASPGLDGATAKLGSWATGTTT
GQAGIVLDDVRKFRRSARISGSGATGFGGGALGGMYDDQPAQASTNGQQQRRVSPSQLNS
>seq_6
AQQNAINLGLAGLQQQQQQHQQQLRSGAASPGLSSQQAAVAAQQNWRNGLGSPAVDSSDQ
YSQHGMGAFGMGSPANLSANAQLANLFALQQQMMQQQQMQQLNMAAAAGIALTPVQMMGL
QQQQQQAMLSPGGRGFGMGMNGMGMNGMMGMGMGGMGSPRRSPRQSDRSPGGKTNLPSTV
.
.
.
.

Output file 1 contents:

Code:

>seq_1
MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA
>seq_2
 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE
>seq_3
ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM
ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA

Output file 2 contents:

Code:

>seq_4
TTLPPAPVSPTTTTQAEDAAAAATLASQRAKLKASSRISAPANILLGASGADGVKSPLWS
EKERVVERRSPSPSGRNVERPKSTGSTGEPAQPNNSHAGMNLSQSTGPPSASFLRSPAPD
>seq_5
FDSQLSPIVGGNWASMVNTPLMPMFGSKGGGEGGSFGGLASPGLDGATAKLGSWATGTTT
GQAGIVLDDVRKFRRSARISGSGATGFGGGALGGMYDDQPAQASTNGQQQRRVSPSQLNS
>seq_6
AQQNAINLGLAGLQQQQQQHQQQLRSGAASPGLSSQQAAVAAQQNWRNGLGSPAVDSSDQ
YSQHGMGAFGMGSPANLSANAQLANLFALQQQMMQQQQMQQLNMAAAAGIALTPVQMMGL
QQQQQQAMLSPGGRGFGMGMNGMGMNGMMGMGMGGMGSPRRSPRQSDRSPGGKTNLPSTV

If I have a long list data inside a file, how I can divide the data into different file?
I need three data inside each file.
For example, my data source got 300 sequence.
I need it to divide 3 sequence in a file. Total desired output are 100 files that content 3 sequence each.
Do anybody got idea to solve my trouble?
Thanks a lot for all of your guide.

Last edited by pludi; 10-30-2009 at 05:51 AM.. Reason: code tags, please...

patrick87

View Public Profile for patrick87

Find all posts by patrick87

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl script error to split huge data one by one.

Below is my perl script: #!/usr/bin/perl open(FILE,"$ARGV") or die "$!"; @DATA = <FILE>; close FILE; $join = join("",@DATA); @array = split( ">",$join); for($i=0;$i<=scalar(@array);$i++){ system ("/home/bin/./program_name_count_length MULTI_sequence_DATA_FILE -d...

2. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ...

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment...

4. Shell Programming and Scripting

how to split a huge file by every 100 lines

into small files. i need to add a head.txt and tail.txt into small files at the begin and end, and give a name as q1.xml q2.xml q3.xml .... thank you very much.

5. Shell Programming and Scripting

Split a file into several files using a data

Hi All, I have file(File1) with data like below: 102100|LName|Gender|Company|Branch|Bday|Salary|Age 102100|bbbb|male|cccc|dddd|19900814|15000|20| 102101|asdg|male|gggg|ksgu|19911216||| 102102|bdbm|male|kkkk|acke|19931018||23| 102102|kfjg|male|kkkc|gkgg|19921213|14000|24|...

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below ...

7. Shell Programming and Scripting

Split a folder with huge number of files in n folders

We have a folder XYZ with large number of files (>350,000). how can i split the folder and create say 10 of them XYZ1 to XYZ10 with 35,000 files each. (doesnt matter which files go where).

8. Shell Programming and Scripting

Split JSON to different data files

Hi Gurus, I have below JSON file, now I want to rewrite this file into a new file. I will appreciate if anyone can help me to provide the solution...I can't use jq. { "_id": "3ad893cb4cf1560add7b4caffd4b6126", "_rev": "1-1f0ce165e1d210319cf6e9f9c6ff654f", "name":...

9. UNIX for Advanced & Expert Users

File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this

I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat) File 1 - 15 columns File 2 - 15 columns Data is...

10. Solaris

Split huge File System

Gents I have huge NAS File System as /sys with size 10 TB and I want to Split each 1TB in spirit File System to be mounted in the server. How to can I do that without changing anything in the source. Please your support.

LEARN ABOUT DEBIAN

bio::variation::snp

Bio::Variation::SNP(3pm)				User Contributed Perl Documentation				  Bio::Variation::SNP(3pm)

NAME

       Bio::Variation::SNP - submitted SNP

SYNOPSIS

	 $SNP = Bio::Variation::SNP->new ();

DESCRIPTION

       Inherits from Bio::Variation::SeqDiff and Bio::Variation::Allele, with additional methods that are (db)SNP specific (ie, refSNP/subSNP IDs,
       batch IDs, validation methods).

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one
       of the Bioperl mailing lists. Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the
       web:

	 https://redmine.open-bio.org/projects/bioperl/

AUTHOR

       Allen Day <allenday@ucla.edu>

APPENDIX

       The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

   get/set-able methods
	Usage	: $is = $snp->method()
	Function: for getting/setting attributes
	Returns : a value.  probably a scalar.
	Args	: if you're trying to set an attribute, pass in the new value.

	Methods:
	--------
	id
	type
	observed
	seq_5
	seq_3
	ncbi_build
	ncbi_chr_hits
	ncbi_ctg_hits
	ncbi_seq_loc
	ucsc_build
	ucsc_chr_hits
	ucsc_ctg_hits
	heterozygous
	heterozygous_SE
	validated
	genotype
	handle
	batch_id
	method
	locus_id
	symbol
	mrna
	protein
	functional_class

   is_subsnp
	Title	: is_subsnp
	Usage	: $is = $snp->is_subsnp()
	Function: returns 1 if $snp is a subSNP
	Returns : 1 or undef
	Args	: NONE

   subsnp
	Title	: subsnp
	Usage	: $subsnp = $snp->subsnp()
	Function: returns the currently active subSNP of $snp
	Returns : Bio::Variation::SNP
	Args	: NONE

   add_subsnp
	Title	: add_subsnp
	Usage	: $subsnp = $snp->add_subsnp()
	Function: pushes the previous value returned by subsnp() onto a stack,
		  accessible with each_subsnp().
		  Sets return value of subsnp() to a new Bio::Variation::SNP
		  object, and returns that object.
	Returns : Bio::Varitiation::SNP
	Args	: NONE

   each_subsnp
	Title	: each_subsnp
	Usage	: @subsnps = $snp->each_subsnp()
	Function: returns a list of the subSNPs of a refSNP
	Returns : list
	Args	: NONE

perl v5.14.2							    2012-03-02						  Bio::Variation::SNP(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl script error to split huge data one by one.

Discussion started by: patrick87

2. Shell Programming and Scripting

Problem running Perl Script with huge data files

Discussion started by: ad23

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

Discussion started by: lv99

4. Shell Programming and Scripting

how to split a huge file by every 100 lines

Discussion started by: dtdt

5. Shell Programming and Scripting

Split a file into several files using a data

Discussion started by: sarav.shan

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Discussion started by: KishM

7. Shell Programming and Scripting

Split a folder with huge number of files in n folders

Discussion started by: AlokKumbhare

8. Shell Programming and Scripting

Split JSON to different data files

Discussion started by: manas_ranjan

9. UNIX for Advanced & Expert Users

File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this

Discussion started by: kartikirans

10. Solaris

Split huge File System

Discussion started by: AbuAliiiiiiiiii

LEARN ABOUT DEBIAN

bio::variation::snp