03-20-2010
Script to split text files
Hi All,
I'm fairly new to scripting, so need a little help to get started with this problem.
I don't mind whether I go for an awk/bash/other approach, I don't really know which would be best suited to the problem...
Lets say I have a 10000 line text file, I would like to split this up into a few smaller files. Something like:
10 line, say the last 10 lines
100 line, say the first 100 lines
1000 line, say the last 1000 lines
5000 line, say the middle 5000 lines
This I could probably manage with head & tail etc.
However, if my text file was only 1000 lines long it would now work so well. I'g get 10 and 100 lines ok, but the 3rd would give me what I already have, and I guess the 4th would fail. What I would actually want is more like:
1 line
10 lines
100 lines
500 lines
Similarly, a text file much larger than 10000 lines, I'd want to behave the same the other way, like a 100k file = 100, 1000, 10000, 50000.
The numbers of lines does not need to be exact either. I would not mind doing the splits based on a percentage of the lines in the original file. Nor would I mind if lines in the original file were selected at random.
Basically, I just want a set of small medium large larger files of whatever size, but proportional to the original. Files would not need to be unique either, line 1 in the small file, and then line 1-10 in the medium file is fine, though if it's easier I would not mind lines 2-11 in the second file.
I hope I've not over-complicated this explanation...
Would somebody please give me a steer on where to start. What should I use for this - awk?, should I try and use percentages, or try and work out absolutes that work in every situation?
Many thanks!
Phil.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I am getting a few gzip files into a folder by doing ftp to another server. Once I get them I move them to another location .But before that I need to make sure each gzip is not more than 5000 lines and split it up . The files I get are anywhere from 500 lines to 10000 lines in them and is in gzip... (4 Replies)
Discussion started by: gubbu
4 Replies
2. Shell Programming and Scripting
Hi,
I need help to split lines from a file into multiple files.
my input look like this:
13
23 45 45 6 7
33 44 55 66 7
13
34 5 6 7 87
45 7 8 8 9
13
44 55 66 77 8
44 66 88 99 6
I want to split every 3 lines from this file to be written to individual files. (3 Replies)
Discussion started by: saint2006
3 Replies
3. UNIX for Dummies Questions & Answers
Hi
Here is my script that calls my awk script
#!/bin/bash
set -x
dir="/var/local/dsx/csv"
testfile="$testfile"
while getopts " f: " option
do
case $option in
f ) testfile="$OPTARG";;
esac;
done
./scriptFile --testfile=$testfile >> $dir/$testfile.csv
It calls my awk... (1 Reply)
Discussion started by: ladyAnne
1 Replies
4. Shell Programming and Scripting
Hello everyone,
I work under Ubuntu 11.10 (c-shell)
I need a script to create a new text file whose content is the text of another text files that are in the directory $DIRMAIL at this moment.
I will show you an example:
- On the one hand, there is a directory $DIRMAIL where there are... (1 Reply)
Discussion started by: tenteyu
1 Replies
5. Shell Programming and Scripting
Hi Guys,
I'm very new to bash scripting. Please help me on this.
I'm in need of a backup script which does the ff.
1. If a file is larger than 5GB. split it and tar the file.
2. Weekly backup file to amazon s3 using s3rsync
3. If a file is unchanged it doesn't need to copy to amazon s3
... (4 Replies)
Discussion started by: ganitolngyundre
4 Replies
6. Shell Programming and Scripting
I had a text file(comma seperated values) which contains as below
196237,ram,25-May-06,ram.kiran@xyz.com,204183,Pavan,4-Jun-07,Pavan.Desai@xyz.com,237107,ram Chandra,15-Mar-10,ram.krishna@xyz.com ... (3 Replies)
Discussion started by: giridhar276
3 Replies
7. Shell Programming and Scripting
So I have a space delimited file that I'd like to split into multiple files based on multiple column values.
This is what my data looks like
1bc9A02 1 10 1000 FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH
1ku1A02 1 10... (9 Replies)
Discussion started by: viored
9 Replies
8. Shell Programming and Scripting
I have a text file with entries like
1186
5556
90844
7873
7722
12
7890.6
78.52
6679
3455
9867
1127
5642
..N so many records like this.
I want to split this file into multiple files like cluster1.txt, cluster2.txt, cluster3.txt, ..... clusterN.txt. (4 Replies)
Discussion started by: sammy777
4 Replies
9. Shell Programming and Scripting
solid top
facet normal 0 1 0
outer loop
vertex 0 1 0
vertex 1 1 1
vertex 1 1 0
endloop
endfacet
facet normal 0 1 0
outer loop
vertex 0 1 0
vertex 0 1 1
vertex 1 1 1
endloop
endfacet
endsolid top
solid bottom
facet normal 0 -1 ... (3 Replies)
Discussion started by: linuxUser_
3 Replies
10. UNIX for Beginners Questions & Answers
I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like:
ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3
AA;ax;ay;az;01;02;03;04;05;06;07;08;09
BB;bx;by;bz;03;05;33;44;15;26;27;08;09
I want to split this table in to multiple files:
... (1 Reply)
Discussion started by: trymega
1 Replies
INTRO(9) BSD Kernel Developer's Manual INTRO(9)
NAME
intro -- introduction to system kernel interfaces
DESCRIPTION
This section contains information about the interfaces and subroutines in the kernel.
PROTOTYPES ANSI-C AND ALL THAT
Yes please.
We would like all code to be fully prototyped.
If your code compiles cleanly with cc -Wall we would feel happy about it. It is important to understand that this isn't a question of just
shutting up cc, it is a question about avoiding the things it complains about. To put it bluntly, don't hide the problem by casting and
other obfuscating practices, solve the problem.
INDENTATION AND STYLE
Believe it or not, there actually exists a guide for indentation and style. It isn't generally applied though.
We would appreciate if people would pay attention to it, and at least not violate it blatantly.
We don't mind it too badly if you have your own style, but please make sure we can read it too.
Please take time to read style(9) for more information.
NAMING THINGS
Some general rules exist:
1. If a function is meant as a debugging aid in DDB, it should be enclosed in
#ifdef DDB
#endif /* DDB */
And the name of the procedure should start with the prefix DDB_ to clearly identify the procedure as a debugger routine.
SCOPE OF SYMBOLS
It is important to carefully consider the scope of symbols in the kernel. The default is to make everything static, unless some reason
requires the opposite.
There are several reasons for this policy, the main one is that the kernel is one monolithic name-space, and pollution is not a good idea
here either.
For device drivers and other modules that don't add new internal interfaces to the kernel, the entire source should be in one file if possi-
ble. That way all symbols can be made static.
If for some reason a module is split over multiple source files, then try to split the module along some major fault-line and consider using
the number of global symbols as your guide. The fewer the better.
SEE ALSO
style(9)
HISTORY
The intro section manual page appeared in FreeBSD 2.2.
BSD
December 13, 1995 BSD