Edit a Huge one line file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Edit a Huge one line file
# 1  
Old 06-12-2012
Edit a Huge one line file

We have a huge file which has just one really large line; about 500 MB. I want to
1. Count all the occurrences of a phrase
2. Replace the phrase with another.

Trying to open it using vi has not helped as it complains that it is too large. Can any script help? Please advise.

Thank you,
# 2  
Old 06-12-2012
To count:
Code:
perl -ne '@x=/phrase/g;print $#x+1 . "\n"' file

To replace:
Code:
perl -i -pe 's/phrase/replace/g' file

# 3  
Old 06-12-2012
Can you show a sample of this file? Maybe it just uses a different separator than newline, in which case awk could process it via changing the value of RS and ORS...
# 4  
Old 06-12-2012
i've not dealt with a file of a single 500MB line, so I play and create one:

Code:
$ cat input
a horse is a horse, of course, of course, and no one can talk to a horse of course. That is, of course, unless the horse is the famous Mr. Ed.

while i cat onto itself until it reached >500MB.
now we play:
Code:
[mute@geek ~]$ perl -ne '@x=/horse/g;print $#x+1 . "\n"' input
Killed
[mute@geek ~]$ grep -o horse input | wc -l
grep: input: Cannot allocate memory
0
[mute@geek ~]$ awk -F 'horse' '{print NF}' input
awk: cmd. line:1: fatal: grow_iop_buffer: iop->buf: can't allocate 536870914 bytes of memory (Cannot allocate memory)
[mute@geek ~]$ gawk -vRS="horse" 'END{print NR}' input
20331009

well, seems there isn't many tools that aren't line based.. Smilie awk will use a single character separator, unless it's gawk.
# 5  
Old 06-12-2012
How much physical memory do you have on that machine, neutronscott?
# 6  
Old 06-13-2012
well 256 ram and 256 swap. OP has issues opening the file in vi so I figured I'd run the tests in my low-end VPS.
as Corona pointed out, there must be a different record separator that could be used, so that a program doesn't attempt to load the entire line into memory.
more information is needed about the format of the file. i'm unaware of a standard unix visual editor that'd properly open such a file (though i'm very inexperienced in the subject, i can usually hack away solutions but i've not enough information)
# 7  
Old 06-13-2012
Well, at least the problem with "vi" i can explain:

A file cannot be edited in the same place where it is stored. This is the reason why i.e.

Code:
sed '<something>' infile > outfile

works, while

Code:
sed '<something>' infile > infile

will not (the same is true for awk, etc.).

Some (GNU-)versions of sed circumvent this principal limitation by introducing inline-editing ("-i"), which works the same way interactive editors like vi do: they create a copy at program start and only upon saving/finishing the work they copy this over the original file.

vi typically uses /var/tmp per default, but can be configured to use other places too (at least to my knowledge all versions of vi offer such an option via the .exrc file). If this filesystem has not enough free space to hold the copy an attempt to edit the file will fail even if there would be enough space in memory to hold it.

Another limitation is the maximum line length: this is a system limitation and how long lines can grow is laid down in the kernel header file limits.h in the constant "LINE_MAX".

I hope this helps.
This User Gave Thanks to bakunin For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Reading ALL BUT the first and last line of a huge file

Hi. Pardon me if I'm posting a duplicate thread but.. I have a text file with over 150 Million records, file size is in the range if MB(close to GB). The requirement is to read ALL the lines excepting the FIRST LINE which is the file header and the LAST LINE which is it's trailer record. ... (8 Replies)
Discussion started by: kumarjt
8 Replies

2. UNIX for Dummies Questions & Answers

Need to replace new line characters in a huge file

Hi , I would like to replace new line characters(\n) in a huge file of about 2 million records . I tried this one (:%s/\n//g) but it's hanging there and no result. Does this command do not work if the file is big. Please let me know if you have any other options Regards Raj (1 Reply)
Discussion started by: rajeevm
1 Replies

3. Shell Programming and Scripting

Edit first line of a text file

Hi friends, Issue1: I have a text file with the first line like this #chrom start end Readcount_A Normalized_Readcount_A ReadcountB Normalized_Readcount_B Fc_A_vs_B pvalue_A_vs_B FDR_A_vs_B Fc_B_vs_A pvalue_B_vs_A FDR_B_vs_A <a href="http://unix.com/">Link</a> How can I change it to the... (11 Replies)
Discussion started by: jacobs.smith
11 Replies

4. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

5. Shell Programming and Scripting

How to edit file to have one line entry?

Hello All, My file content is: DROP TABLE "FACT_WORLD"; CREATE TABLE "FACT_WORLD" ( "AR_ID" INTEGER NOT NULL, "ORG_ID" INTEGER NOT NULL ) DATA CAPTURE NONE COMPRESS YES; I want to change this file to have entries in one... (6 Replies)
Discussion started by: akash2508
6 Replies

6. Shell Programming and Scripting

Implement in one line sed or awk having no delimiter and file size is huge

I have file which contains around 5000 lines. The lines are fixed legth but having no delimiter.Each line line contains nearly 3000 characters. I want to delete the lines a> if it starts with 1 and if 576th postion is a digit i,e 0-9 or b> if it starts with 0 or 9(i,e header and footer) ... (4 Replies)
Discussion started by: millan
4 Replies

7. Shell Programming and Scripting

Edit a line in a file with perl

Hi, How can I edit a line in a file? For example, a.txt contains: start: 1 2 3 4 stop: a b c d and I want to change "3" to "9" and to add "5" after "4" the result should be (a.txt): start: 1 9 3 4 5 stop: a b c d Thanks, zed (5 Replies)
Discussion started by: zed
5 Replies

8. UNIX for Dummies Questions & Answers

How to remove FIRST Line of huge text file on Solaris

i need help..!!!! i have one big text file estimate data file size 50 - 100GB with 70 Mega Rows. on OS SUN Solaris version 8 How i can remove first line of the text file. Please suggest me for solutions. Thank you very much in advance:) (5 Replies)
Discussion started by: madoatz
5 Replies

9. UNIX for Dummies Questions & Answers

edit each line in the file

I am trying to edit each line in a file. The file has several columns delimitted by '|'. I need to take out the last two columns. Each line starts with a unique word through which I am storing the lines in a variable and cutting the last two colums. But, when I am echoing the line, it is... (2 Replies)
Discussion started by: chiru_h
2 Replies

10. UNIX for Advanced & Expert Users

Insert a line as the first line into a very huge file

Hello, I need to insert a line (like a header) as the first line of a very huge file (about 3 ml rows). I am able to do it with sed, but redirecting the output and creating a new file takes quite some time. I was wondering if there was a more efficient way of doing it? Any help would be... (3 Replies)
Discussion started by: shriek
3 Replies
Login or Register to Ask a Question