Sponsored Content
Top Forums Shell Programming and Scripting Append specific lines to a previous line based on sequential search criteria Post 302345926 by jesse on Thursday 20th of August 2009 02:58:51 PM
Old 08-20-2009
Append specific lines to a previous line based on sequential search criteria

I'll try explain this as best I can. Let me know if it is not clear.

I have large text files that contain data as such:

Code:
143593502  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:11 N     line 1 test
line 2 test
line 3 test
143593503  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:10 N     another message

Every line in the file that starts with a 9 digit number (followed by a date / time and so on) is a unique message. In the example the first 3 lines are really 1 message (with 2 newlines in it).

This first 9 digit number increments sequentially.

What I want to do is get each message in it's entirety onto 1 line. So I *want* the file to look like:

Code:
143593502  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:11 N     line 1 test line 2 test line 3 test
143593503  09-08-20 09:02:13 xxxxxxxxxxx          xxxxxxxxxxx          09-08-20 09:02:10 N     another message

Note that I'd like there to be a space between the additional lines in a single message.

My first idea was to remove ALL newlines from the file and replace them with spaces, and then work through that data inserting a newline after each of the sequence numbers.

I believe this will solve the problem but unfortunately I don't have the chops to pull it off. I'm sure there are also other, potentially better, ways of solving the problem.

One potential issue, I suppose, would be if one of the "extra" lines in a single message was miraculously the next 9 digit number in the sequence itself. I believe the chances of this would be pretty slim, probably to the extent of making this a moot concern for me at this point... but nonetheless it's something to consider.

Ideally I would like to do this with either perl or bash.

Thanks.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Hi All, I need to select only those records having a non zero record in the first column of a comma delimited file. Suppose my input file is having data like: "0","01/08/2005 07:11:15",1,1,"Created",,"01/08/2005" "0","01/08/2005 07:12:40",1,1,"Created",,"01/08/2005"... (2 Replies)
Discussion started by: shashi_kiran_v
2 Replies

2. Shell Programming and Scripting

How to use sed to search for string and Print previous two lines and current line

Hello, Can anybody help me to correct my sed syntax to find the string and print previous two lines and current line and next one line. i am using string as "testing" netstat -v | sed -n -e '/test/{x;2!p;g;$!N;p;D;}' -e h i am able to get the previous line current line next line but... (1 Reply)
Discussion started by: nmadhuhb
1 Replies

3. Shell Programming and Scripting

Delete new lines based on search criteria

Hi all! A bit of background: I am trying to create a script that formats SQL statements. I have gotten so far as to add new lines based on certain match criteria like commas, keywords etc. In the process, I end up adding newlines where I don't want. For example: substr(colName, 1, 10)... (3 Replies)
Discussion started by: jayarkay
3 Replies

4. Shell Programming and Scripting

Extract data based on specific search criteria

I have a huge file (about 2 millions records) contains data separated by “,” (comma). As part of the requirement, I can't change the format. The objective is to remove some of the records with the following condition. If the 23rd field on each line start with 302 , I need to remove that from the... (4 Replies)
Discussion started by: jaygamini
4 Replies

5. Shell Programming and Scripting

Merging Lines based on criteria

Hello, Need help with following scenario. A file contains following text: {beginning of file} New: This is a new record and it is not on same line. Since I have lost touch with script take this challenge and bring all this in one line. New: Hello losttouch. You seem to be struggling... (4 Replies)
Discussion started by: losttouch
4 Replies

6. Shell Programming and Scripting

Need To Delete Lines Based On Search Criteria

Hi All, I have following input file. I wish to retain those lines which match multiple search criteria. The search criteria is stored in a variable seperated from each other by comma(,). SEARCH_CRITERIA = "REJECT, DUPLICATE" Input File: ERROR,MYFILE_20130214_11387,9,37.75... (3 Replies)
Discussion started by: angshuman
3 Replies

7. Shell Programming and Scripting

Append next line to previous lines when NF is less than 0

Hi All, This is very urgent, I've a data file with 1.7 millions rows in the file and the delimiter is cedilla and I need to format the data in such a way that if the NF in the next row is less than 1, it will append that value to previous line. Any help will be appricated. Thanks,... (17 Replies)
Discussion started by: cumeh1624
17 Replies

8. Shell Programming and Scripting

Copying section of file based on search criteria

Hi Guru's, I am new to unix scripting. I have a huge file with user details in it(file2) and I have another file with a list of users(file1). Script has to search a user from file1 and get all the associated lines from file2. Example: fiel1: cn=abc cn=DEF cn=xyx File 2: dn:... (10 Replies)
Discussion started by: Samingla
10 Replies

9. Shell Programming and Scripting

Returning multiple outputs of a single line based on previous repeated lines

Hello, I am trying to return a time multiple times from a file that has varying output just before the time instance, i.e. cat jumped cat jumped cat jumped time = 1.1 cat jumped cat jumped time = 1.2 cat jumped cat jumped time = 1.3 In this case i would like to output a time.txt... (6 Replies)
Discussion started by: ryddner
6 Replies

10. Shell Programming and Scripting

awk to print specific line in file based on criteria

In the file below I am trying to extract a specific instance of path, if the adjacent plugin": "/rundb/api/v1/plugin/49/. Thank you :). file "path": "/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52", "plugin": "/rundb/api/v1/plugin/49/",... (8 Replies)
Discussion started by: cmccabe
8 Replies
mbox(5)                                                         File Formats Manual                                                        mbox(5)

NAME
mbox - file containing mail messages INTRODUCTION
The most common format for storage of mail messages is mbox format. An mbox is a single file containing zero or more mail messages. MESSAGE FORMAT
A message encoded in mbox format begins with a From_ line, continues with a series of non-From_ lines, and ends with a blank line. A From_ line means any line that begins with the characters F, r, o, m, space: From god@heaven.af.mil Sat Jan 3 01:05:34 1996 Return-Path: <god@heaven.af.mil> Delivered-To: djb@silverton.berkeley.edu Date: 3 Jan 1996 01:05:34 -0000 From: God <god@heaven.af.mil> To: djb@silverton.berkeley.edu (D. J. Bernstein) How's that mail system project coming along? The final line is a completely blank line (no spaces or tabs). Notice that blank lines may also appear elsewhere in the message. The From_ line always looks like From envsender date moreinfo. envsender is one word, without spaces or tabs; it is usually the envelope sender of the message. date is the delivery date of the message. It always contains exactly 24 characters in asctime format. moreinfo is optional; it may contain arbitrary information. Between the From_ line and the blank line is a message in RFC 822 format, as described in qmail-header(5), subject to >From quoting as described below. HOW A MESSAGE IS DELIVERED
Here is how a program appends a message to an mbox file. It first creates a From_ line given the message's envelope sender and the current date. If the envelope sender is empty (i.e., if this is a bounce message), the program uses MAILER-DAEMON instead. If the envelope sender contains spaces, tabs, or newlines, the program replaces them with hyphens. The program then copies the message, applying >From quoting to each line. >From quoting ensures that the resulting lines are not From_ lines: the program prepends a > to any From_ line, >From_ line, >>From_ line, >>>From_ line, etc. Finally the program appends a blank line to the message. If the last line of the message was a partial line, it writes two newlines; oth- erwise it writes one. HOW A MESSAGE IS READ
A reader scans through an mbox file looking for From_ lines. Any From_ line marks the beginning of a message. The reader should not attempt to take advantage of the fact that every From_ line (past the beginning of the file) is preceded by a blank line. Once the reader finds a message, it extracts a (possibly corrupted) envelope sender and delivery date out of the From_ line. It then reads until the next From_ line or end of file, whichever comes first. It strips off the final blank line and deletes the quoting of >From_ lines and >>From_ lines and so on. The result is an RFC 822 message. COMMON MBOX VARIANTS
There are many variants of mbox format. The variant described above is mboxrd format, popularized by Rahul Dhesi in June 1995. The original mboxo format quotes only From_ lines, not >From_ lines. As a result it is impossible to tell whether From: djb@silverton.berkeley.edu (D. J. Bernstein) To: god@heaven.af.mil >From now through August I'll be doing beta testing. Thanks for your interest. was quoted in the original message. An mboxrd reader will always strip off the quoting. mboxcl format is like mboxo format, but includes a Content-Length field with the number of bytes in the message. mboxcl2 format is like mboxcl but has no >From quoting. These formats are used by SVR4 mailers. mboxcl2 cannot be read safely by mboxrd readers. UNSPECIFIED DETAILS
There are many locking mechanisms for mbox files. qmail-local always uses flock on systems that have it, otherwise lockf. The delivery date in a From_ line does not specify a time zone. qmail-local always creates the delivery date in GMT so that mbox files can be safely transported from one time zone to another. If the mtime on a nonempty mbox file is greater than the atime, the file has new mail. If the mtime is smaller than the atime, the new mail has been read. If the atime equals the mtime, there is no way to tell whether the file has new mail, since qmail-local takes much less than a second to run. One solution is for a mail reader to artificially set the atime to the mtime plus 1. Then the file has new mail if and only if the atime is less than or equal to the mtime. Some mail readers place Status fields in each message to indicate which messages have been read. SEE ALSO
maildir(5), qmail-header(5), qmail-local(8) mbox(5)
All times are GMT -4. The time now is 05:16 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy