![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| fopen() + reading in large text files | JamesGoh | High Level Programming | 2 | 03-11-2008 10:30 AM |
| large files? | ranj@chn | UNIX for Dummies Questions & Answers | 2 | 11-29-2006 06:55 AM |
| List large files | GNMIKE | UNIX for Dummies Questions & Answers | 2 | 12-28-2005 01:48 PM |
| Large files | sehgalniraj | UNIX for Dummies Questions & Answers | 4 | 03-31-2005 08:03 AM |
| grep multiple text files in folder into 1 text file? | coppertone | UNIX for Dummies Questions & Answers | 7 | 08-23-2002 02:50 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Large Text Files
Hi All
I have approximately 10 files that are at least 100+ MB in size. I am importing them into a DB to output them to the web. What i need to do first is clean the files up so i dont have un necessary rows in the DB. Below is what the file looks like: Ignore the <TAB> annotations as that is just showing you what the file looks like Also ignore the Part A and Part B designataion as that is a descriptor to tell you what the format of the csv file looks like. Part A: (the header information) "Report Type"<TAB>"This Report" "Date: 200610" "Report: All Files" "more junk:" <TAB> "Even More Junk" "FileName"<TAB>"FilePath"<TAB>"LastAccessed"<TAB>"LastModified"<TAB>"Owner" Part B the actual data i want to scrunch together without blank lines) "NameofFile"<TAB>"PathOfFiles"<TAB>"FileLastAccessed"<TAB>"FileLastModified"<TAB>"FileOwner" "NameofFile"<TAB>"PathOfFiles"<TAB>"FileLastAccessed"<TAB>"FileLastModified"<TAB>"FileOwner" "NameofFile"<TAB>"PathOfFiles"<TAB>"FileLastAccessed"<TAB>"FileLastModified"<TAB>"FileOwner" "NameofFile"<TAB>"PathOfFiles"<TAB>"FileLastAccessed"<TAB>"FileLastModified"<TAB>"FileOwner" and on down the list for approximately 50 Lines Then "Some Report Exection Time" Part A Part B Part A Part B Part A and Part B Repeat over and over again, obviously showing all the files on a drive. What I want to do i get Rid of the Part A Completely and only keep the first "FileName"<TAB>"FilePath"<TAB>"LastAccessed"<TAB>"LastModified"<TAB>"Owner" These are large files ranging from 100-500MB in size, so i want something quick and effecient such as SED or AWK but am unsure how to craft it. I tried something like this in a sed file and called it via the Win32GNU tool SED sed -f sedscript input filename >output filename here is what the sed script file looked like: /^$/d #get rid of spaces s/"Report Type"<TAB>"This Report"//g #globally replace these strings s/"Date: 200610"//g s/"Report: All Files"//g s/"more junk:" <TAB> "Even More Junk"//g but i got some strange results. Only some of the blank lines disappeared, and left some blank lines that i didnt think it should have so maybe there is some hidden ASCII character there that i cant see? Basically, what i would like from you all is am i doing this the best way? And any syntax help would be appreciated. FYI, I have to do this on a Windows box so i have to either use ActivePerl, the PERL that comes with Microsoft SFU, or the GNUWin32 tools GAWK and SED. I have enough memory (4 GB), dual core XEON, and plenty of disk space. Thanks for the help/opinions. Joe |
|
||||
|
Quote:
I tried using the /d also but it kept telling me that SED was missing an argument. So the way i am understanding it, it should be like this? s/"Report Set:"//g s/"All Files ERM"//g s/"All Files"//g s/"Object Name:"//g s/"06\/29\/2006 11:18:12"//g s/"Selection:"//g s/ All Files//g s/"Description: This Report *"//g /^$/d There appears to be some unicode there also,尀 <--characters how do i get SED to see those when running the script? I put the following in: s/尀//g and returned an error on line 1 "Unknown Command". thanks!! |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|