Sponsored Content
Top Forums Shell Programming and Scripting Modification of perl script to split a large file into chunks of 5000 chracters Post 303017077 by jim mcnamara on Tuesday 8th of May 2018 11:46:03 PM
Old 05-09-2018
That may also be why your perl has issues as well. UTF8 characters encode all of Unicode 1,112,064 characters, so a UTF8 character may be 8, 16, 24, or 32 bits.

To fix perl will require the understanding of wide characters, a locale based "datatype", sort of. Help is here:
Perl Programming/Unicode UTF-8 - Wikibooks, open books for an open world

Recent linux awk version 4.2 onward splits UTF8 encoded records into fields using wide characters, -a forces the split to be created and placed in the $F array. Here is a perl sample and an awk sample that do the same thing on UTF8 files.
Code:
perl -CSD -aF'\N{U+1f4a9}' -nle 'print $F[0]' somefile.txt  # $F[0] is the same as awk's $1 variable

awk -F$'\U0001f4a9' '{print $1}' somefile.txt  # or $'\u007c' for 4-digit code points

code point is a delimiter. All of this is explained in the link.
This User Gave Thanks to jim mcnamara For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split A Large File

Hi, I have a large file(csv format) that I need to split into 2 files. The file looks something like Original_file.txt first name, family name, address a, b, c, d, e, f, and so on for over 100,00 lines I need to create two files from this one file. The condition is i need to ensure... (4 Replies)
Discussion started by: nbvcxzdz
4 Replies

2. HP-UX

Need to split a large data file using a Unix script

Greetings all: I am still new to Unix environment and I need help with the following requirement. I have a large sequential file sorted on a field (say store#) that is being split into several smaller files, one for each store. That means if there are 500 stores, there will be 500 files. This... (1 Reply)
Discussion started by: SAIK
1 Replies

3. Shell Programming and Scripting

Split Large File

HI, i've to split a large file which inputs seems like : Input file name_file.txt 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00003|CCCC|MAIL|DATEOFBIRTHT|.......... (1 Reply)
Discussion started by: AMARA
1 Replies

4. Shell Programming and Scripting

how to get split output of a file, using perl script

Hi, I have file: data.log.1 ### s1 main.build.3495 main.build.199 main.build.3408 ###s2 main.build.3495 main.build.3408 main.build.199 I want to read this file and store in two arrays in Perl. I have following command, which is working fine on command prompt. perl -n -e... (1 Reply)
Discussion started by: ashvini
1 Replies

5. Shell Programming and Scripting

Split file into chunks of low & high byte

Hi guys, i have a question about spliting a binary file into 2 chunks. First chunk with all high bytes and the second one with all low bytes. What unix tools can i use? And how can this be performed? I looked in manpages of split and dd but this does not help. Thanks (2 Replies)
Discussion started by: basta
2 Replies

6. Shell Programming and Scripting

Split a large file

I have a 3 GB text file that I would like to split. How can I do this? It's a giant comma-separated list of numbers. I would like to make it into about 20 files of ~100 MB each, with a custom header and footer. The file can only be split on commas, but they're plentiful. Something like... (3 Replies)
Discussion started by: CRGreathouse
3 Replies

7. Shell Programming and Scripting

perl script to split the text file after every 4th field

I had a text file(comma seperated values) which contains as below 196237,ram,25-May-06,ram.kiran@xyz.com,204183,Pavan,4-Jun-07,Pavan.Desai@xyz.com,237107,ram Chandra,15-Mar-10,ram.krishna@xyz.com ... (3 Replies)
Discussion started by: giridhar276
3 Replies

8. Shell Programming and Scripting

Split a large array into small chunks

Hi, I need to split a large array "@sharedArray" into 10 small arrays. The arrays should be like @sharedArray1,@sharedArray2,@sharedArray3...so on.. Can anyone help me with the logic to do so :(:confused: (6 Replies)
Discussion started by: rkrish
6 Replies

9. UNIX for Beginners Questions & Answers

Split large file into smaller files without disturbing the entry chunks

Dears, Need you help with the below file manipulation. I want to split the file into 8 smaller files but without cutting/disturbing the entries (meaning every small file should start with a entry and end with an empty line). It will be helpful if you can provide a one liner command for this... (12 Replies)
Discussion started by: Kamesh G
12 Replies

10. UNIX for Beginners Questions & Answers

Trying To Split a Large File

Trying to split a 35gb file into 1000mb parts. My research shows I should you this. split -b 1000m file.txt and my return is "split: cannot open 'crunch1.txt' for reading: No such file or directory" so I tried split -b 1000m Documents/Wordlists/file.txt and I get nothing other than the curser just... (3 Replies)
Discussion started by: sub terra
3 Replies
avisplit(1)						      General Commands Manual						       avisplit(1)

NAME
avisplit - split AVI-files into chunks of a maximum size SYNOPSIS
avisplit [ -i file -o base [ -s size ] [ -H num ] [ -t s1-s2[,s3-s4,..] -c -m -b num -f commentfile ] ] [ -v ] COPYRIGHT
avisplit is Copyright (C) by Thomas Oestreich. DESCRIPTION
avisplit splits a single AVI-file into chunks of size size. Each of the created chunks will be an independent file, i.e. it can be played without needing any other of the chunk. OPTIONS
-i file Specify the filename of the file to split into chunks. -o base Specify the base of the output filename(s) avisplit will then split to base-%04d.avi -s size Use this option to specify the maximum size (in units of MB) of the chunks avisplit should create. 0 means dechunk, create as many files as possible. -H num Create only the first num chunks then exit. -t s1-s2[,s3-s4,..] Split the input file based on time/framecode (hh:mm:ss.ms) -c Together with -t. Merge all segments into one AVI-File again instead generating seperate files. -m Together with -t. Force split at upper bondary instead of lower border. -b num Specify if avisplit should write an VBR mp3 header into the AVI file. Default is 1 because it does not hurt. num is either 1 or 0. -f commentfile Read AVI tombstone data for header comments from commentfile. See /docs/avi_comments.txt for a sample. -v Print only version information and exit. EXAMPLES
The command avisplit -s 700 -i my_file.avi will split the file my_file.avi into chunks which's maximum size will not exceed 700 MB, i.e. they will fit onto a CD, each. The created chunks will be named my_file.avi-0000, my_file.avi-0001, etc. avisplit -i my_file.avi -c -o out.avi -t 00:10:00-00:11:00,00:13:00-00:14:00 will grab Minutes 10 to 11 and 13 to 14 from my_file.avi and merge it into out.avi BAD SYNCH
When you split a file with avisplit and the A/V sync for the first file is OK but the sync on all successive files is bad then have a look at the output of tcprobe(1) (shortend). | V: 25.000 fps, codec=dvsd, frames=250, width=720, height=576 | A: 48000 Hz, format=0x01, bits=16, channels=2, bitrate=1536 kbps, | 10 chunks, 1920000 bytes You'll see the AVI file has only 10 Audio chunks but 250 video chunks. That means one audio chunk spans several video frames. avisplit can not cut a chunk in half, it only handles complete chunks. If you do, say, avisplit -s 20, it is possible that the first file will have 6 audio chunks and the second one only 4 meaning there is too much audio in the first AVI file. The solution is to remux the AVI file with transcode -i in.avi -P1 -N 0x1 -y raw -o out.avi (of course -N 0x1 is not correct for all AVI files). Now look at tcprobe again | V: 25.000 fps, codec=dvsd, frames=250, width=720, height=576 | A: 48000 Hz, format=0x01, bits=16, channels=2, bitrate=1536 kbps, | 250 chunks, 1920000 bytes The data in this file is exactly the same (its bit-identical) as it was in in.avi; the AVI file was just written in a different way, we do now have 250 audio chunks which makes splitting much easier and more accurate for avisplit. AUTHORS
avisplit was written by Thomas Oestreich <ostreich@theorie.physik.uni-goettingen.de> with contributions from many others. See AUTHORS for details. SEE ALSO
aviindex(1), avifix(1), avimerge(1), tccat(1), tcdecode(1), tcdemux(1), tcextract(1), tcprobe(1), tcscan(1), transcode(1) avisplit(1) 25th June 2003 avisplit(1)
All times are GMT -4. The time now is 09:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy