Sponsored Content
Full Discussion: Large Text Files
Top Forums Shell Programming and Scripting Large Text Files Post 302079492 by caddyjoe77 on Tuesday 11th of July 2006 12:23:27 AM
Old 07-11-2006
Large Text Files

Hi All

I have approximately 10 files that are at least 100+ MB in size. I am importing them into a DB to output them to the web. What i need to do first is clean the files up so i dont have un necessary rows in the DB. Below is what the file looks like:

Ignore the <TAB> annotations as that is just showing you what the file looks like Smilie Also ignore the Part A and Part B designataion as that is a descriptor to tell you what the format of the csv file looks like.

Part A: (the header information)

"Report Type"<TAB>"This Report"

"Date: 200610"

"Report: All Files"

"more junk:" <TAB> "Even More Junk"

"FileName"<TAB>"FilePath"<TAB>"LastAccessed"<TAB>"LastModified"<TAB>"Owner"

Part BSmiliethe actual data i want to scrunch together without blank lines)

"NameofFile"<TAB>"PathOfFiles"<TAB>"FileLastAccessed"<TAB>"FileLastModified"<TAB>"FileOwner"
"NameofFile"<TAB>"PathOfFiles"<TAB>"FileLastAccessed"<TAB>"FileLastModified"<TAB>"FileOwner"
"NameofFile"<TAB>"PathOfFiles"<TAB>"FileLastAccessed"<TAB>"FileLastModified"<TAB>"FileOwner"
"NameofFile"<TAB>"PathOfFiles"<TAB>"FileLastAccessed"<TAB>"FileLastModified"<TAB>"FileOwner"
and on down the list for approximately 50 Lines

Then "Some Report Exection Time"

Part A

Part B

Part A

Part B

Part A and Part B Repeat over and over again, obviously showing all the files on a drive.

What I want to do i get Rid of the Part A Completely and only keep the first
"FileName"<TAB>"FilePath"<TAB>"LastAccessed"<TAB>"LastModified"<TAB>"Owner"

These are large files ranging from 100-500MB in size, so i want something quick and effecient such as SED or AWK but am unsure how to craft it.

I tried something like this in a sed file and called it via the Win32GNU tool SED

sed -f sedscript input filename >output filename

here is what the sed script file looked like:

/^$/d #get rid of spaces

s/"Report Type"<TAB>"This Report"//g #globally replace these strings
s/"Date: 200610"//g
s/"Report: All Files"//g
s/"more junk:" <TAB> "Even More Junk"//g

but i got some strange results. Only some of the blank lines disappeared, and left some blank lines that i didnt think it should have so maybe there is some hidden ASCII character there that i cant see?

Basically, what i would like from you all is am i doing this the best way? And any syntax help would be appreciated. FYI, I have to do this on a Windows box so i have to either use ActivePerl, the PERL that comes with Microsoft SFU, or the GNUWin32 tools GAWK and SED. I have enough memory (4 GB), dual core XEON, and plenty of disk space.

Thanks for the help/opinions.

Joe
 

9 More Discussions You Might Find Interesting

1. Programming

fopen() + reading in large text files

For reading in large text files (say files over 1kB in size) are there any issues with fopen() that I should be aware of ? cheers (2 Replies)
Discussion started by: JamesGoh
2 Replies

2. Shell Programming and Scripting

Need to extract 7 characters immediately after text '19' from a large file.

Hi All!! I have a large file containing millions of record. My purpose is to extract 7 characters immediately after text '19' from this file (including text '19') and save the result in new file. So, my OUTPUT would be as under : 191234561 194567894 192789005 198839408 and so on..... ... (7 Replies)
Discussion started by: parshant_bvcoe
7 Replies

3. Shell Programming and Scripting

Sed or awk script to remove text / or perform calculations from large CSV files

I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in... (6 Replies)
Discussion started by: metronomadic
6 Replies

4. Shell Programming and Scripting

Help with splitting a large text file into smaller ones

Hi Everyone, I am using a centos 5.2 server as an sflow log collector on my network. Currently I am using inmons free sflowtool to collect the packets sent by my switches. I have a bash script running on an infinate loop to stop and start the log collection at set intervals - currently one... (2 Replies)
Discussion started by: lord_butler
2 Replies

5. Shell Programming and Scripting

extract unique pattern from large text file

Hi All, I am trying to extract data from a large text file , I want to extract lines which contains a five digit number followed by a hyphen , like 12345- , i tried with egrep ,eg : egrep "+" text.txt but which returns all the lines which contains any number of digits followed by hyhen ,... (19 Replies)
Discussion started by: shijujoe
19 Replies

6. Shell Programming and Scripting

Need help combining large number of text files

Hi, i have more than 1000 data files(.txt) like this first file format: 178.83 554.545 179.21 80.392 second file: 178.83 990.909 179.21 90.196 etc. I want to combine them to the following format: 178.83,554.545,990.909,... 179.21,80.392,90.196,... (7 Replies)
Discussion started by: mr_monocyte
7 Replies

7. Solaris

How to safely copy full filesystems with large files (10Gb files)

Hello everyone. Need some help copying a filesystem. The situation is this: I have an oracle DB mounted on /u01 and need to copy it to /u02. /u01 is 500 Gb and /u02 is 300 Gb. The size used on /u01 is 187 Gb. This is running on solaris 9 and both filesystems are UFS. I have tried to do it using:... (14 Replies)
Discussion started by: dragonov7
14 Replies

8. Shell Programming and Scripting

splitting a large text file into paragraphs

Hello all, newbie here. I've searched the forum and found many "how to split a text file" topics but none that are what I'm looking for. I have a large text file (~15 MB) in size. It contains a variable number of "paragraphs" (for lack of a better word) that are each of variable length. A... (3 Replies)
Discussion started by: lupin..the..3rd
3 Replies

9. Programming

Fast string removal from large text collection

Hi All, I don't want any codes for this problem. Just suggestions: I have a huge collection of text files (around 300,000) which look like this: 1.fil orange apple dskjdsk computer skjks The entire text collection (referenced above) has about 1 billion words. I have created... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies
GIT-COUNT-OBJECTS(1)                                                Git Manual                                                GIT-COUNT-OBJECTS(1)

NAME
git-count-objects - Count unpacked number of objects and their disk consumption SYNOPSIS
git count-objects [-v] [-H | --human-readable] DESCRIPTION
This counts the number of unpacked object files and disk space consumed by them, to help you decide when it is a good time to repack. OPTIONS
-v, --verbose Report in more detail: count: the number of loose objects size: disk space consumed by loose objects, in KiB (unless -H is specified) in-pack: the number of in-pack objects size-pack: disk space consumed by the packs, in KiB (unless -H is specified) prune-packable: the number of loose objects that are also present in the packs. These objects could be pruned using git prune-packed. garbage: the number of files in object database that are neither valid loose objects nor valid packs size-garbage: disk space consumed by garbage files, in KiB (unless -H is specified) alternate: absolute path of alternate object databases; may appear multiple times, one line per path. Note that if the path contains non-printable characters, it may be surrounded by double-quotes and contain C-style backslashed escape sequences. -H, --human-readable Print sizes in human readable format GIT
Part of the git(1) suite Git 2.17.1 10/05/2018 GIT-COUNT-OBJECTS(1)
All times are GMT -4. The time now is 03:43 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy