Sponsored Content
Top Forums Shell Programming and Scripting How to fix line breaks format text for huge files? Post 302590158 by binlib on Saturday 14th of January 2012 11:28:33 AM
Old 01-14-2012
So much interest in this topic, let's try another way for performance since his file is huge. This will edit the file in place and should be blazing fast.
Code:
linux$ cat t.c
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>

int main(int ac, char *av[])
{
  struct stat st;
  char *m, *e;
  int fd;

  fd = open(av[1], O_RDWR); 
  if (fstat(fd, &st) < 0) { perror("fstat"); exit(1); }
  m = mmap(0, st.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  if (m == 0 || m == (void *)-1) { perror("mmap"); exit(1); }
  e = m + st.st_size;

  while (m < e) {
    if ((m = strchr(m, '\n')) == 0) break;
    if (*++m == 'T') break; 
    if (*m != 'D') m[-1] = ' ';
  }

  return 0;
}

linux$ gcc t.c

linux$ cat t.dat
HEADER474687
D1356jkl ugbliuybikb 879870
898976098 9687680
D77656757 uhgliug liygoiygig
D98679hjh kjbgihguygfu ugliyh
kbygfluy9809
D8796870 kjlhuigiyig
TRAILER0008

linux$ cp t.dat t1.dat

linux$ ./a.out t1.dat

linux$ cat t1.dat
HEADER474687
D1356jkl ugbliuybikb 879870 898976098 9687680
D77656757 uhgliug liygoiygig
D98679hjh kjbgihguygfu ugliyh kbygfluy9809
D8796870 kjlhuigiyig
TRAILER0008

linux$ cmp -l t.dat t1.dat
 41  12  40
118  12  40

These 2 Users Gave Thanks to binlib For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to remove FIRST Line of huge text file on Solaris

i need help..!!!! i have one big text file estimate data file size 50 - 100GB with 70 Mega Rows. on OS SUN Solaris version 8 How i can remove first line of the text file. Please suggest me for solutions. Thank you very much in advance:) (5 Replies)
Discussion started by: madoatz
5 Replies

2. Shell Programming and Scripting

Fix the breaks

The file FTP'd got few breaks and the data looks like: ABCTOM NYMANAGER ABCDAVE NJ PROGRAMMER ABCJIM CTTECHLEAD ABCPETERCA HR and i want the output like: ABCTOM NYMANAGER ABCDAVE NJPROGRAMMER ABCJIM CTTECHLEAD ABCPETERCAHR can you please help me in writing the shell... (8 Replies)
Discussion started by: rlmadhav
8 Replies

3. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

4. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies

5. UNIX for Dummies Questions & Answers

VIM search and replace with line breaks in both the target and replacement text

Hi, Ive spent ages trying to find an explanation for how to do this on the web, but now feel like I'm :wall: I would like to change each occurence (there are many within my script) of the following: to in Vim. I know how to search and replace when it is just single lines... (2 Replies)
Discussion started by: blueade7
2 Replies

6. Shell Programming and Scripting

Format & Compare two huge CSV files

I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs. FILE1 ----- 2001,"John",25,19901130,21211.41,Unix Forum... (3 Replies)
Discussion started by: Sheel
3 Replies

7. Windows & DOS: Issues & Discussions

Convert UNIX text file in Windows to recognize line breaks

Hmmm I think I found the correct subforum to ask my question... I have some text files that I prepared in vi some time ago, and now I want to open and edit them with Windows Notepad. I don't have a Unix terminal at the moment so I need to do the conversion in Windows. Is there a way to do this?... (1 Reply)
Discussion started by: frys_hp
1 Replies

8. UNIX for Dummies Questions & Answers

Convert UNIX text file in Windows to recognize line breaks

Hi all, I have some text files that I prepared in vi some time ago, and now I want to open and edit them with Windows Notepad. I don't have a Unix terminal at the moment so I need to do the conversion in Windows. Is there a way to do this? Or just reinsert thousands of line breaks again :eek: ? (2 Replies)
Discussion started by: frys_hp
2 Replies

9. UNIX for Dummies Questions & Answers

Page breaks and line breaks

Hi All, Need an urgent solution to an issue . We have created a ksh file or shell script which generates 1 DAT file. the DAT file contains extract of a select statement . Now the issue is , when we are executing the ksh file , the output is coimng with page breaks and line breaks . We have... (4 Replies)
Discussion started by: Ayaskant
4 Replies

10. Shell Programming and Scripting

How to add line breaks to perl command with large text in single quotes?

Below code extracts multiple field values from XML into array and prints all in one line. perl -nle '@r=/(?: jndiName| authDataAlias| value| minConnections| maxConnections| connectionTimeout| name)="(+)/g and print join ",",$ENV{tIPnSCOPE},$ENV{pr ovider},$ENV{impClassName},@r' server.xml ... (4 Replies)
Discussion started by: kchinnam
4 Replies
TRACE-CMD-SPLIT(1)														TRACE-CMD-SPLIT(1)

NAME
trace-cmd-split - split a trace.dat file into smaller files SYNOPSIS
trace-cmd split [OPTIONS] [start-time [end-time]] DESCRIPTION
The trace-cmd(1) split is used to break up a trace.dat into small files. The start-time specifies where the new file will start at. Using trace-cmd-report(1) and copying the time stamp given at a particular event, can be used as input for either start-time or end-time. The split will stop creating files when it reaches an event after end-time. If only the end-time is needed, use 0.0 as the start-time. If start-time is left out, then the split will start at the beginning of the file. If end-time is left out, then split will continue to the end unless it meets one of the requirements specified by the options. OPTIONS
-i file If this option is not specified, then the split command will look for the file named trace.dat. This options will allow the reading of another file other than trace.dat. -o file By default, the split command will use the input file name as a basis of where to write the split files. The output file will be the input file with an attached '.#' to the end: trace.dat.1, trace.dat.2, etc. This option will change the name of the base file used. -o file will create file.1, file.2, etc. -s seconds This specifies how many seconds should be recorded before the new file should stop. -m milliseconds This specifies how many milliseconds should be recorded before the new file should stop. -u microseconds This specifies how many microseconds should be recorded before the new file should stop. -e events This specifies how many events should be recorded before the new file should stop. -p pages This specifies the number of pages that should be recorded before the new file should stop. Note: only one of *-p*, *-e*, *-u*, *-m*, *-s* may be specified at a time. If *-p* is specified, then *-c* is automatically set. -r This option causes the break up to repeat until end-time is reached (or end of the input if end-time is not specified). trace-cmd split -r -e 10000 This will break up trace.dat into several smaller files, each with at most 10,000 events in it. -c This option causes the above break up to be per CPU. trace-cmd split -c -p 10 This will create a file that has 10 pages per each CPU from the input. SEE ALSO
trace-cmd(1), trace-cmd-record(1), trace-cmd-report(1), trace-cmd-start(1), trace-cmd-stop(1), trace-cmd-extract(1), trace-cmd-reset(1), trace-cmd-list(1), trace-cmd-listen(1) AUTHOR
Written by Steven Rostedt, <rostedt@goodmis.org[1]> RESOURCES
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git COPYING
Copyright (C) 2010 Red Hat, Inc. Free use of this software is granted under the terms of the GNU Public License (GPL). NOTES
1. rostedt@goodmis.org mailto:rostedt@goodmis.org 06/11/2014 TRACE-CMD-SPLIT(1)
All times are GMT -4. The time now is 01:54 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy