Sponsored Content
Full Discussion: Subsampling a large file
Top Forums UNIX for Dummies Questions & Answers Subsampling a large file Post 302408798 by Boltzmann on Tuesday 30th of March 2010 01:18:03 PM
Old 03-30-2010
Subsampling a large file

Hello Everyone,

I have to subsample a large text file - 17,021,811 lines. I need to keep every other block of 290 lines (keep first 290 lines, throw away the next 290, keep the next 290, etc.)

So, I wrote the following bash script:

Code:
 
i=0
start=1
stop=290
LIMIT=29449
 
while [ "$i" -lt "$LIMIT" ]
 
do
 
sed -n "$start , $stop p" file_src >> file_dest
 
i=$(($i+1))
start=$((1+2*290+$i))
stop=$(($start+289))
 
done

So, this works but it is taking forever and I am wondering if there is a better/more efficient way.

Thank you very much in advance for any advice.

Last edited by Boltzmann; 03-30-2010 at 06:58 PM..
 

10 More Discussions You Might Find Interesting

1. Filesystems, Disks and Memory

Strange difference in file size when copying LARGE file..

Hi, Im trying to take a database backup. one of the files is 26 GB. I am using cp -pr to create a backup copy of the database. after the copying is complete, if i do du -hrs on the folders i saw a difference of 2GB. The weird fact is that the BACKUP folder was 2 GB more than the original one! ... (1 Reply)
Discussion started by: 0ktalmagik
1 Replies

2. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (1 Reply)
Discussion started by: ashish4422
1 Replies

3. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies

4. Shell Programming and Scripting

Script to search a large file with a list of terms in another file

Hi- I am trying to search a large file with a number of different search terms that are listed one per line in 3 different files. Most importantly I need to be able to do a case insensitive search. I have tried just using egrep -f but it doesn't seam to be able to handle the -i option when... (3 Replies)
Discussion started by: dougzilla
3 Replies

5. Shell Programming and Scripting

Split large file into smaller file

hi Guys i need some help here.. i have a file which has > 800,000 lines in it. I need to split this file into smaller files with 25000 lines each. please help thanks (1 Reply)
Discussion started by: sitaldip
1 Replies

6. AIX

tar: 0511-825 The file 'file' is too large.

Dears, i am trying to comprees file but it gave me error as below: userhost>tar cvf - file | gzip > file.tar.gz tar: 0511-825 The file 'file' is too large. Be noted that this file is nearly to "9 Giga". Please, advise Thanks & Reagrds, Please use code tags! Also please do not... (3 Replies)
Discussion started by: mohammedmostafa
3 Replies

7. Shell Programming and Scripting

Compare large file and identify difference in separate file

I have a very large system generated file containing around 500K rows size 100MB like following HOME|ALICE STREET|3||NEW LISTING HOME|NEWPORT STREET|1||NEW LISTING HOME|KING STREET|5||NEW LISTING HOME|WINSOME AVENUE|4||MODIFICATION CAR|TOYOTA|4||NEW LISTING CAR|FORD|4||NEW... (9 Replies)
Discussion started by: jubaier
9 Replies

8. Shell Programming and Scripting

Lookup on large file based on a temp file

hello guys Please help me with the below issue I have two files one base file another lookupfile base file abc-001 bcd-001 cde-001 Lookupfile abc-001|11|12 abc-001|11|12 abc-001|11|12 (6 Replies)
Discussion started by: Pratik4891
6 Replies

9. Linux

Split a large textfile (one file) into multiple file to base on ^L

Hi, Anyone can help, I have a large textfile (one file), and I need to split into multiple file to break each file into ^L. My textfile ========== abc company abc address abc contact ^L my company my address my contact my skills ^L your company your address ========== (3 Replies)
Discussion started by: fspalero
3 Replies

10. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies
COLRM(1)						    BSD General Commands Manual 						  COLRM(1)

NAME
colrm -- remove columns from a file SYNOPSIS
colrm [start [stop]] DESCRIPTION
The colrm utility removes selected columns from the lines of a file. A column is defined as a single character in a line. Input is read from the standard input. Output is written to the standard output. If only the start column is specified, columns numbered less than the start column will be written. If both start and stop columns are spec- ified, columns numbered less than the start column or greater than the stop column will be written. Column numbering starts with one, not zero. Tab characters increment the column count to the next multiple of eight. Backspace characters decrement the column count by one. ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of colrm as described in environ(7). EXIT STATUS
The colrm utility exits 0 on success, and >0 if an error occurs. SEE ALSO
awk(1), column(1), cut(1), paste(1) HISTORY
The colrm command appeared in 3.0BSD. BSD
August 4, 2004 BSD
All times are GMT -4. The time now is 02:08 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy