Large file data handling issue


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Large file data handling issue
# 8  
Old 11-14-2012
Yes, I do mean whether the file contains unix standard line terminators (line feed characters) to define records or whether it is just a "stream". None of the unix standard utilities such as "sed" and "awk" are designed to deal with files which have no definded record format.
# 9  
Old 11-14-2012
How can I overcome this limitation
# 10  
Old 11-15-2012
Can use you post the last bytes of the file from the output of the 'od' command:
Code:
od -c filename

# 11  
Old 11-15-2012
2225460 1 2 - 1 0 - 2 5 0 8 : 4 9 : 0
2225500 3 |
2225502
# 12  
Old 11-15-2012
You can cheat by telling awk that | is the record separator. It won't try to process all 600K as one line.

Note this reads the file twice, so that doublethink isn't necessary to figure out what the last 'line' is.

Code:
awk -v RS="|" 'NR==FNR { LAST=NR; next } FNR!=LAST { printf("%s%s", P, $0); P="|" } END { printf("\n"); }' inputfile inputfile


Last edited by Corona688; 11-15-2012 at 12:11 PM..
# 13  
Old 11-15-2012
So the file has no line feed, you just want to get rid of the last byte. If you have gnu "head" you can do
Code:
head -c size-1 inputfile > outputfile

Or you can use dd to truncate the file in place (make backup if you need):
Code:
dd if=/dev/null of=inputfile bs=1 count=1 seek=size-1

Of course dd can do "head -c" too:
Code:
dd if=inputfile of=outputfile bs=size-1 count=1

If your input had line feed, you could truncate to size-2 and then append a line feed to it.
# 14  
Old 11-15-2012
If you haven't got a solution yet, Try this:
Code:
# Covert all '|' to new lines
sed 's/|/\n/' filename > temp_file
# Determine record number of last line(s) you want to drop
wc -l temp_file
less -n temp_file
# Convert new lines back to '|', xxxx is the last record you want to keep
head -xxxx temp_file | sed 's/\n/|/' > new_file

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Large File masking incorrectly happening Ç delimeter issue

The OS version is Red Hat Enterprise Linux Server release 6.10 I have a script to mask some columns with **** in a data file which is delimeted with Ç , I am using awk for the masking , when I try to mask a small file the awk works fine and masks the required column , but when the file is... (6 Replies)
Discussion started by: LinuxUser8092
6 Replies

2. Shell Programming and Scripting

Output large volume of data to CSV file

I have a program that output the ownership and permission on each directory and file on the server to a csv file. I am getting error message when I run the program. The program is not outputting to the csv file. Error: the file access permissions do not allow the specified action cannot... (2 Replies)
Discussion started by: dellanicholson
2 Replies

3. UNIX for Dummies Questions & Answers

File handling issue

Hi All, I am running into an issue. I have a very big file. Wants to split it in smaller chunks. This file has multiple header/ trailers. Also, between each header/trailer there are records. Number of records in each header trailer combination can vary. Also, headers can start with... (3 Replies)
Discussion started by: Gurkamal83
3 Replies

4. Shell Programming and Scripting

UNIX file handling issue

I have a huge file semicolon( ; ) separated records are Pipe(|) delimited. e.g abc;def;ghi|jkl;mno;pqr|123;456;789 I need to replace the 50th field(semicolon separated) of each record with 9006. The 50th field can have no value e.g. ;; Can someone help me with the appropriate command. (3 Replies)
Discussion started by: Gurkamal83
3 Replies

5. Red Hat

Advice regarding filesystems handling large number of files

Hi All, I have a CentOS operating system installed. I work with really huge number of files which are not only huge in number but some of them really huge in size. Minimum number of files could be 1 million to 2 million in one directory itself. Some of the files are even several Gigabytes in... (2 Replies)
Discussion started by: shoaibjameel123
2 Replies

6. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

7. Shell Programming and Scripting

UNIX File handling -Issue in reading a file

I have been doing automation of daily check activity for a server, i have been using sqls to retrive the data and while loop for reading the data from the file for several activities. BUT i got a show stopper the below one.. where the data is getting store in $temp_file, but not being read by while... (1 Reply)
Discussion started by: KuldeepSinghTCS
1 Replies

8. Shell Programming and Scripting

Extract data from large file 80+ million records

Hello, I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file. What will be the besat and fastest way to extract the ne file. sample file format :--... (2 Replies)
Discussion started by: learner16s
2 Replies

9. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies

10. HP-UX

Need to split a large data file using a Unix script

Greetings all: I am still new to Unix environment and I need help with the following requirement. I have a large sequential file sorted on a field (say store#) that is being split into several smaller files, one for each store. That means if there are 500 stores, there will be 500 files. This... (1 Reply)
Discussion started by: SAIK
1 Replies
Login or Register to Ask a Question