Large file data handling issue

11-14-2012

Registered User

6,402, 678

Join Date: Mar 2008

Last Activity: 8 June 2016, 9:58 PM EDT

Posts: 6,402

Thanks Given: 288

Thanked 678 Times in 647 Posts

Yes, I do mean whether the file contains unix standard line terminators (line feed characters) to define records or whether it is just a "stream". None of the unix standard utilities such as "sed" and "awk" are designed to deal with files which have no definded record format.

methyl

View Public Profile for methyl

Find all posts by methyl

11-14-2012

Registered User

20, 0

Join Date: Nov 2012

Last Activity: 24 February 2015, 9:24 PM EST

Posts: 20

Thanks Given: 1

Thanked 0 Times in 0 Posts

How can I overcome this limitation

Gurkamal83

View Public Profile for Gurkamal83

Find all posts by Gurkamal83

11-15-2012

Registered User

440, 71

Join Date: Oct 2009

Last Activity: 26 June 2018, 6:52 PM EDT

Location: spaceBAR Central

Posts: 440

Thanks Given: 0

Thanked 71 Times in 70 Posts

Can use you post the last bytes of the file from the output of the 'od' command:

Code:

od -c filename

spacebar

View Public Profile for spacebar

Find all posts by spacebar

11-15-2012

Registered User

20, 0

Join Date: Nov 2012

Last Activity: 24 February 2015, 9:24 PM EST

Posts: 20

Thanks Given: 1

Thanked 0 Times in 0 Posts

2225460 1 2 - 1 0 - 2 5 0 8 : 4 9 : 0
2225500 3 |
2225502

Gurkamal83

View Public Profile for Gurkamal83

Find all posts by Gurkamal83

11-15-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

You can cheat by telling awk that | is the record separator. It won't try to process all 600K as one line.

Note this reads the file twice, so that doublethink isn't necessary to figure out what the last 'line' is.

Code:

awk -v RS="|" 'NR==FNR { LAST=NR; next } FNR!=LAST { printf("%s%s", P, $0); P="|" } END { printf("\n"); }' inputfile inputfile

Last edited by Corona688; 11-15-2012 at 12:11 PM..

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

11-15-2012

Registered User

380, 91

Join Date: Aug 2009

Last Activity: 15 March 2013, 10:40 AM EDT

Location: New Jersey

Posts: 380

Thanks Given: 7

Thanked 91 Times in 75 Posts

So the file has no line feed, you just want to get rid of the last byte. If you have gnu "head" you can do

Code:

head -c size-1 inputfile > outputfile

Or you can use dd to truncate the file in place (make backup if you need):

Code:

dd if=/dev/null of=inputfile bs=1 count=1 seek=size-1

Of course dd can do "head -c" too:

Code:

dd if=inputfile of=outputfile bs=size-1 count=1

If your input had line feed, you could truncate to size-2 and then append a line feed to it.

binlib

View Public Profile for binlib

Find all posts by binlib

11-15-2012

Registered User

440, 71

Join Date: Oct 2009

Last Activity: 26 June 2018, 6:52 PM EDT

Location: spaceBAR Central

Posts: 440

Thanks Given: 0

Thanked 71 Times in 70 Posts

If you haven't got a solution yet, Try this:

Code:

# Covert all '|' to new lines
sed 's/|/\n/' filename > temp_file
# Determine record number of last line(s) you want to drop
wc -l temp_file
less -n temp_file
# Convert new lines back to '|', xxxx is the last record you want to keep
head -xxxx temp_file | sed 's/\n/|/' > new_file

spacebar

View Public Profile for spacebar

Find all posts by spacebar

UNIX for Dummies Questions & Answers

Large file data handling issue

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Large File masking incorrectly happening � delimeter issue

Discussion started by: LinuxUser8092

2. Shell Programming and Scripting

Output large volume of data to CSV file

Discussion started by: dellanicholson

3. UNIX for Dummies Questions & Answers

File handling issue

Discussion started by: Gurkamal83

4. Shell Programming and Scripting

UNIX file handling issue

Discussion started by: Gurkamal83

5. Red Hat

Advice regarding filesystems handling large number of files

Discussion started by: shoaibjameel123

6. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Discussion started by: Souvik

7. Shell Programming and Scripting

UNIX File handling -Issue in reading a file

Discussion started by: KuldeepSinghTCS

8. Shell Programming and Scripting

Extract data from large file 80+ million records

Discussion started by: learner16s

9. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Discussion started by: KRAMA

10. HP-UX

Need to split a large data file using a Unix script

Discussion started by: SAIK