Replacing in huge text file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Replacing in huge text file
# 1  
Old 05-28-2011
Replacing in huge text file

I have huge text files (~120 MB)x100 which equivalents to ~11GB of data. The files contain pure numbers, actually the value of "phi" to 10 billion digits!!

I know its huge!! Here are the last few lines of a file
Code:
0952899155 3233967444 3344925499 0276061529 7261968933 9683989044 3317145063 2771963944 5807139825 5785263278 : 999996
7076665287 1341193004 9994291160 2752806087 3098057018 7993954003 8272886989 6031743863 1213075239 5486559526 : 999997
4770078828 1376659981 9345095495 5822463216 7224348351 6200913437 5085852987 6060405404 9200077203 8324752051 : 999998
4334324783 5519682615 3340745027 7486245638 0533805208 0097461685 3057557984 4986386591 3281896020 9655014075 : 999999
6983266465 0958762067 5922249107 5144125222 8226019880 4186130718 6909500836 2519505480 1837059131 8941970031 : 1000000

each line consists of 10x10 digits and at the end the line number. What I want to do is to remove the spaces and the trailing line number and line break. I tried doing that using sed but I keep messing up. I want the output as:
Code:
095289915532339674443344925499027606152972619689339683989044331714506327719639445807139825578526327870766652871341193004999 and so on.......

I'm relatively new to shell so if you could add a little explanation so that I could learn too.

Thanks a lot.

---------- Post updated at 08:28 AM ---------- Previous update was at 07:49 AM ----------

Ok, after lot of searching I finally got it:
Code:
for(( i = 1 ; i <= 100 ; i++))
do
        cat phi-(printf "%.3d" "$i").txt | sed 's/ : [0-9]*\| //g' | tr -d "\r \n" > $i.txt
done

Where filenames are phi-001.txt, phi-002.txt ..... phi-100.txt

Is there any simpler way to do it?

---------- Post updated at 08:29 AM ---------- Previous update was at 08:28 AM ----------

Simpler as in more CPU and resource efficient ??
# 2  
Old 05-29-2011
Can you try using this inside your loop if i got the request right.

Code:
awk -F":" ' {gsub(" ","",$1); printf $1 } ' phi-${i} > $i.txt


Last edited by Peasant; 05-29-2011 at 03:03 AM..
# 3  
Old 05-29-2011
i tried that, its not removing the newlines and the data " : xxxxx"...

any ways I replaced the spaces with null and imported the files in a mysql database....
now the problem is that querying the database is taking huge time....

So whats the best way to search for a substring in approx 18 GB of data and is 18 GB of text file creation possible ??
# 4  
Old 05-29-2011
Hmmm what's wrong with it ?
(on your input)
Code:
$ cat phi
0952899155 3233967444 3344925499 0276061529 7261968933 9683989044 3317145063 2771963944 5807139825 5785263278 : 999996
7076665287 1341193004 9994291160 2752806087 3098057018 7993954003 8272886989 6031743863 1213075239 5486559526 : 999997
4770078828 1376659981 9345095495 5822463216 7224348351 6200913437 5085852987 6060405404 9200077203 8324752051 : 999998
4334324783 5519682615 3340745027 7486245638 0533805208 0097461685 3057557984 4986386591 3281896020 9655014075 : 999999
6983266465 0958762067 5922249107 5144125222 8226019880 4186130718 6909500836 2519505480 1837059131 8941970031 : 1000000
$ awk -F":" ' {gsub(" ","",$1); printf $1 } ' phi
09528991553233967444334492549902760615297261968933968398904433171450632771963944580713982557852632787076665287134119300499942911602752806087309805701879939540038272886989603174386312130752395486559526477007882813766599819345095495582246321672243483516200913437508585298760604054049200077203832475205143343247835519682615334074502774862456380533805208009746168530575579844986386591328189602096550140756983266465095876206759222491075144125222822601988041861307186909500836251950548018370591318941970031$

# 5  
Old 05-29-2011
yeah I caught my error... Since i wanted 3 digit numbers with leading zeros I had messed it up....

Its working fine now.... Now my question is: "Is creation of a 18 GB file possible?" I'm using x86_64 GNU/Linux Ubuntu 10.10... and what will be the best way to search for a substring in this file ???
# 6  
Old 05-29-2011
Yes it's possible.
check block size of your disks and compare with table.
Suppose it's 4k block size, you will be able to create a file upto ~2TB.

Regarding substrings, you can use awk substr to print substrings.
If you can tell what are you trying to accomplish folks here will probably suggest the best way.
# 7  
Old 05-29-2011
How can I check the block size?

What I'm trying to do is to search the 10billion digits of phi a.k.a the golden ratio for number patterns as it is said that phi will have any number series 0provided you look long enough.

Then once a efficient search function is done which will check even for repeated occurrences, it will be used to derive mathematical statistics about numbers present and more over it, which I havent thought yet. Maybe linking it will the stats available like probability and its relation etc.

One method that I think will be to use multi-threaded application so as to quicken the process and use less RAM.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Filter records in a huge text file from a filter text file

Hi Folks, I have a text file with lots of rows with duplicates in the first column, i want to filter out records based on filter columns in a different filter text file. bash scripting is what i need. Data.txt Name OrderID Quantity Sam 123 300 Jay 342 498 Kev 78 2500 Sam 420 50 Vic 10... (3 Replies)
Discussion started by: tech_frk
3 Replies

2. Shell Programming and Scripting

Output only first 400 bytes of a huge text file

How do I output only the first 400 bytes of a huge text file to a new file. It has to be unmodified so no added invisible characters. Many thanks..... (3 Replies)
Discussion started by: garethsays
3 Replies

3. Shell Programming and Scripting

How to open a huge text file?

Hi. I have a 10 Gb text file.the default text editor in ubuntu doens't open it. Does anyone know how can i open it?? Thanks (4 Replies)
Discussion started by: stalaei
4 Replies

4. UNIX for Dummies Questions & Answers

Replacing a column in a text file

Say I had a text file that contained four columns, like the following: Mack Christopher:237 Avondale Blvd:970-791-6419:S Ben Macdonor:30 Dragon Rd:647-288-6395:B I'm making a loop that will replace the fourth column a line in the file with the contents of a variable 'access', but I have no... (6 Replies)
Discussion started by: Sotau
6 Replies

5. Shell Programming and Scripting

Replacing second line from huge files

I'm trying simple functionality of replacing the second line of files with some other string. Problem is these files are huge and there are too many files to process. Could anyone please suggest me a way to replace the second line of all files with another text in a fastest possible manner. ... (2 Replies)
Discussion started by: satish.pyboyina
2 Replies

6. UNIX for Dummies Questions & Answers

Help parsing and replacing text with file name

Hi everyone, I'm having trouble figuring this one out. I have ~100 *.fa files with multiple lines of fasta sequences like this: file1.fa >xyzsequence atcatgcacac...... ataccgagagg..... atataccagag..... >abcsequence atgagatatat..... acacacggd..... atcgaacac.... agttccagat.... The... (2 Replies)
Discussion started by: mycoguy
2 Replies

7. Shell Programming and Scripting

replacing text with contents from another file

I'm trying to change the ramfs size in kernel .config automatically. I have a ramfs_size file generated with du -s cat ramfs_size 64512 I want to replace the linux .config's ramdisk size with the above value CONFIG_BLK_DEV_RAM_SIZE=73728 Right now I'm doing something dumb like: ... (3 Replies)
Discussion started by: amoeba
3 Replies

8. Shell Programming and Scripting

replacing text in a file, but...

Hi all, Very first post on this forums, hope you can help me with this scripting task. I have a big text file with over 3000 lines, some of those lines contain some text that I need to replace, lets say for simplicity the text to be replaced in those lines is "aaa" and I need it to replace it... (2 Replies)
Discussion started by: Angelseph
2 Replies

9. Shell Programming and Scripting

Replacing Text in Text file

Hi Guys, I am needing some help writing a shell script to replace the following in a text file /opt/was/apps/was61 with some other path eg /usr/blan/blah/blah. I know that i can do it using sed or perl but just having difficulty writing the escape characters for it All Help... (3 Replies)
Discussion started by: cgilchrist
3 Replies

10. UNIX for Dummies Questions & Answers

How to remove FIRST Line of huge text file on Solaris

i need help..!!!! i have one big text file estimate data file size 50 - 100GB with 70 Mega Rows. on OS SUN Solaris version 8 How i can remove first line of the text file. Please suggest me for solutions. Thank you very much in advance:) (5 Replies)
Discussion started by: madoatz
5 Replies
Login or Register to Ask a Question