Sponsored Content
Top Forums Shell Programming and Scripting Split a huge data into few different files?! Post 302367199 by Scrutinizer on Sunday 1st of November 2009 11:10:48 PM
Old 11-02-2009
Quote:
Originally Posted by patrick87
Hi Scrutinizer, do you have any idea to get my desired output result?
I try to replace the space of header with "_" and try your suggested code.
Unfortunately, it still can't work Smilie
Thanks a lot for your advise.
Hi patrick87,

The problem is, I put random spaces and : characters inside the labels of your input examples you gave and both scripts still work as expected. I have to assume your real world data sets somehow do not correspond with the input format you provided. You would have to take a small part (say 7 records) of an actual, anonymized, file, then run my scripts on them to see if they also produce the strange results and then post that example input file here, and also list the strange resulting file names and their content, so I can have a look.

S.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl script error to split huge data one by one.

Below is my perl script: #!/usr/bin/perl open(FILE,"$ARGV") or die "$!"; @DATA = <FILE>; close FILE; $join = join("",@DATA); @array = split( ">",$join); for($i=0;$i<=scalar(@array);$i++){ system ("/home/bin/./program_name_count_length MULTI_sequence_DATA_FILE -d... (5 Replies)
Discussion started by: patrick87
5 Replies

2. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ... (4 Replies)
Discussion started by: ad23
4 Replies

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

4. Shell Programming and Scripting

how to split a huge file by every 100 lines

into small files. i need to add a head.txt and tail.txt into small files at the begin and end, and give a name as q1.xml q2.xml q3.xml .... thank you very much. (2 Replies)
Discussion started by: dtdt
2 Replies

5. Shell Programming and Scripting

Split a file into several files using a data

Hi All, I have file(File1) with data like below: 102100|LName|Gender|Company|Branch|Bday|Salary|Age 102100|bbbb|male|cccc|dddd|19900814|15000|20| 102101|asdg|male|gggg|ksgu|19911216||| 102102|bdbm|male|kkkk|acke|19931018||23| 102102|kfjg|male|kkkc|gkgg|19921213|14000|24|... (2 Replies)
Discussion started by: sarav.shan
2 Replies

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

7. Shell Programming and Scripting

Split a folder with huge number of files in n folders

We have a folder XYZ with large number of files (>350,000). how can i split the folder and create say 10 of them XYZ1 to XYZ10 with 35,000 files each. (doesnt matter which files go where). (12 Replies)
Discussion started by: AlokKumbhare
12 Replies

8. Shell Programming and Scripting

Split JSON to different data files

Hi Gurus, I have below JSON file, now I want to rewrite this file into a new file. I will appreciate if anyone can help me to provide the solution...I can't use jq. { "_id": "3ad893cb4cf1560add7b4caffd4b6126", "_rev": "1-1f0ce165e1d210319cf6e9f9c6ff654f", "name":... (4 Replies)
Discussion started by: manas_ranjan
4 Replies

9. UNIX for Advanced & Expert Users

File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this

I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat) File 1 - 15 columns File 2 - 15 columns Data is... (9 Replies)
Discussion started by: kartikirans
9 Replies

10. Solaris

Split huge File System

Gents I have huge NAS File System as /sys with size 10 TB and I want to Split each 1TB in spirit File System to be mounted in the server. How to can I do that without changing anything in the source. Please your support. (1 Reply)
Discussion started by: AbuAliiiiiiiiii
1 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 04:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy