How to conditionally display and remove first line only?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to conditionally display and remove first line only?
# 1  
Old 12-13-2014
How to conditionally display and remove first line only?

I have a maildir hierarchy of 90k eml files and;

1) I would like to walk the tree and display the first line from any file, whose first line begins with;

From -

That's "From space dash space" and only if it's the first seven characters, of the first line in the file.

2) I would also like to count the number of files who have this first line

3) I then would finally like to completely remove that line from those files

Thanks,

Jason

P.S. I only really need 3, but I would love to do 1 and 2 first. TIA
# 2  
Old 12-13-2014
If this is a homework assignment, it need to be refiled in the Homework and Coursework Questions forum following the special rules (including filling out the homework questionnaire) required when submitting any question to that forum.

If this is not a homework assignment, please explain why a *.eml would have a 1st line like that and why removing that line would "improve" things.
# 3  
Old 12-13-2014
Thanks for the response Don.
  1. It's not a homework assignment.
  2. Some time back I switched my email client from kmail to the Thunderbird Linux client.
  3. When I switched clients I maintained my maildir mailstore format, instead of converting everything to mbox.
  4. At some point unbeknownst to me, either initially, or during an update, Thunderbird began inserting into every new EML file, what was described to me as an mbox postmark, at the front of the file. (Which was also explained to me, it's not supposed to do).
  5. It's only now become an issue, because I want to convert the maildir mailstore over to mbox, and every conversion method/program I've tried, drops any message with the mbox postmark, without converting it. This means after conversion, I lose all messages after about Oct 2012.
  6. I tested it by manually removing the first line from several affected emails, and my conversion program successfully converted the message once the mbox postmark line was removed.
Here's a comparison of EML file headers. The first few from one file, which successfully converts, and the first few from one file that won't;

Example EML file headers from a file that successfully converts
Code:
From company@company.com Tue Aug 07 19: 10:33 2012
Return-path: <company@company.com>
Received: from [1.1.1.1] (helo=email-server.com) by
email-server.com with esmtp (Exim 4.69) id 12345-67890-AA for
user@email-server.com; Tue, 07 Aug 2012 19:10:33 +0200
Received: from exim by email-server.com with dspam-scanned (Exim
4.71) id 12345-67890-AA for user@email-server.com; Tue, 07 Aug
2012 19:10:32 +0200
Received: from exim by email-server.com with sa-scanned (Exim 4.71)
id 12345-67890-AA for user@email-server.com; Tue, 07 Aug 2012
19:10:32 +0200

Example EML file headers from a file that fails to convert
Code:
From - Wed Dec 5 11:13:43 2012
X-Account-Key: accountz
X-UIDL: 123456789.0000.email-server,S=47979
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
X-Mozilla-Keys: 
>From company@company.com Wed Dec 05 17:12:38 2012
Return-path: <company@company.com>
Received: from [1.1.1.1] (helo=email-server.com)
     by email-server.com with esmtp (Exim 4.69)
     id 12345-6789-00
     for user@email-server.com@email-server.com; Wed, 05 Dec 2012 17:12:38 +0100
Received: from exim by email-server.com with dspam-scanned (Exim 4.76)
     id 12345-6789-00
     for user@email-server.com@email-server.com; Wed, 05 Dec 2012 17:12:37 +0100
Received: from exim by email-server.com with sa-scanned (Exim 4.76)
     id 12345-6789-00
     for user@email-server.com@email-server.com; Wed, 05 Dec 2012 17:12:37 +0100

If that first line from the second set of headers;
Code:
From - Wed Dec 5 11:13:43 2012

is in the EML file, the conversion program skips the message. If it's removed, the conversion program converts the email. That's the reason for my request for help.

Jason
# 4  
Old 12-14-2014
Please take a look at grep.
# 5  
Old 12-14-2014
derekludwig,
The grep utility searches entire files; not just the first line.

jasn,
The following might not be highly efficient, but it shouldn't be too bad. Your 1st post in this thread talked about "eml" files; but in post #3 in this thread you talked about "EML" files. The following script will process any files in and under the directory you're in when you run it with names ending with . followed by eml in lower case, upper case, or mixed case letters:
Code:
#!/bin/ksh
bfc=0
find . -type f -name '*.[Ee][Mm][Ll]' | while read -r file
do	read -r f1 f2 rest < "$file"
	if [ "$f1" = "From" ] && [ "$f2" = "-" ]
	then	# Bad file found...
		bfc=$((bfc + 1))
		printf 'bad file #%d: %s\n\tFrom - %s\n' $bfc "$file" "$rest"
		ed -s "$file" <<-EOF
			1d
			w
			q
		EOF
	fi
done

This was tested with ksh and bash, but should work with any POSIX conforming shell. (It won't work with a csh derivative nor with an original Bourne shell (such as /bin/sh on Solaris systems).)

Note that it is important that the characters before the EOF at the end of the ed here-document must be tabs; not spaces.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 12-14-2014
With respects to Don, but I was thinking this as three separate tasks:
Quote:
Originally Posted by jasn
1) I would like to walk the tree and display the first line from any file, whose first line begins with;

From -

That's "From space dash space" and only if it's the first seven characters, of the first line in the file.
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print | xargs head -1 | grep '^From - '

Quote:
Originally Posted by jasn
2) I would also like to count the number of files who have this first line
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print | xargs head -1 | grep -c '^From - '

Quote:
Originally Posted by jasn
3) I then would finally like to completely remove that line from those files
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print0 | xargs -0 sed -i -e  '2b' -e '/^From - /d'

Though I would create a backup of the original file...
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print0 | xargs -0 sed -ibak  -e '2b' -e '/^From - /d'

These can be removed at a later point in time, once the export is successful.

One question, does the >From line have to be restored, so that the original sender information is imported?
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print0 | xargs -0 sed -ibak   -e '/^$/,$b' -e 's/>From /From /' -e '2b' -e '/^From - /d'

This User Gave Thanks to derekludwig For This Post:
# 7  
Old 12-14-2014
Quote:
Originally Posted by derekludwig
With respects to Don, but I was thinking this as three separate tasks:

Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print | xargs head -1 | grep '^From - '

That is a reasonable way to do what you're trying to do. I just didn't see how a list of up to 90000 lines with no indication of what file they came from would be very useful. I assumed it would be more useful to count the files and print the 1st line of the files that need to be modified in a single step.
Quote:
Originally Posted by derekludwig
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print | xargs head -1 | grep -c '^From - '

This should work OK if you just want to get a count of the files that jasn wanted to edit.
Quote:
Originally Posted by derekludwig
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print0 | xargs -0 sed -i -e  '2b' -e '/^From - /d'

Though I would create a backup of the original file...
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print0 | xargs -0 sed -ibak  -e '2b' -e '/^From - /d'

These can be removed at a later point in time, once the export is successful.
These pipelines edit every file; not just those that jasn wanted to modify. And, if there are any lines (other than the 1st two lines) that start with From - , they will be removed from the file even if they were data in the middle of a mail message. Did you intend to use -e '2,$b' instead of -e '2b'?
Quote:
Originally Posted by derekludwig
One question, does the >From line have to be restored, so that the original sender information is imported?
Code:
find . -type f -name '*.[Ee][Mm][Ll]' -print0 | xargs -0 sed -ibak   -e '/^$/,$b' -e 's/>From /From /' -e '2b' -e '/^From - /d'

I'm not following the logic of what you're trying to do with this pipeline. Are you trying to remove every line starting with >From - or From - and change every other occurrence of >From to From anywhere else on a line that appears after line 2 and before the first empty line in each file?
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Ksh: Read line parse characters into variable and remove the line if the date is older than 50 days

I have a test file with the following format, It contains the username_date when the user was locked from the database. $ cat lockedusers.txt TEST1_21062016 TEST2_02122015 TEST3_01032016 TEST4_01042016 I'm writing a ksh script and faced with this difficult scenario for my... (11 Replies)
Discussion started by: humble_learner
11 Replies

2. Shell Programming and Scripting

Conditionally add character at end of line

Hi, I would like have a shell script to check every line in a file to see if it ends with ";". If this is NOT the last character ";" should be added. MyFile.csv : web9331801;01/01/2014 23:39:35;;"93962";353150256; web9331802;01/01/2014 23:44:29;;"479288";353153538; web9331803;01/01/2014... (14 Replies)
Discussion started by: vg77
14 Replies

3. UNIX for Dummies Questions & Answers

To find and display the middle line in a file using single line command.

Hi all, How can i display the middle line of a file using a single line command? (6 Replies)
Discussion started by: Lakme Pemmaiah
6 Replies

4. Shell Programming and Scripting

awk to insert new line conditionally

Dear All, I have a file like: If $4=2001, a duplicated line will be inserted and $4 will be assigned value 2011; so that the new file would be: How to write awk to accomplish this? thank you very much! (5 Replies)
Discussion started by: littlewenwen
5 Replies

5. Shell Programming and Scripting

Add text at the end of line conditionally

Hi All, I have a file as below: cat myfile abcdef NA rwer tyujkl na I wish to add the text ".txt" at the end of all lines except the lines starting with NA or na. I know i can add text at the end of line using following command but I am not sure how to valiate the condition. (14 Replies)
Discussion started by: angshuman
14 Replies

6. Shell Programming and Scripting

Remove file conditionally between two server using sftp

Hi, I am having 2 servers, Need to delete files from server1 if those files exist in server2 other wise no action using sftp .And the process is non-interactive way. I have got confused how to check the condition in sftp because there is non of the shell condition or loop command is executing.... (2 Replies)
Discussion started by: posix
2 Replies

7. Shell Programming and Scripting

Remove newline character conditionally

Hi All, I have 5000 records like this Request_id|Type|Status|Priority|Ticket Submitted Date and Time|Actual Resolved Date and Time|Current Ticket Owner Group|Case final Ticket Owner Group|Customer Severity|Reported Symptom/Request|Component|Hot Topic|Reason for Missed SLA|Current Ticket... (2 Replies)
Discussion started by: j_53933
2 Replies

8. Shell Programming and Scripting

Display mutiple line in single line

Hi All, I had a file called Input.txt, i need to group up in a single line as 1=ttt and the no of lines may vary bewteen the 1=ttt cat Input.txt 1=ttt,2=xxxxxx, 3=4545 44545, 4=66667 7777, 5=77723 1=ttt, 2=xxxxxx, 3=34436 66 3545, 4=66666, 5=ffffff, 6=uuuuuuu 1=ttt, 2=xxxxxx,... (4 Replies)
Discussion started by: manosubsulo
4 Replies

9. Shell Programming and Scripting

sed csv remove conditionally

Hello, I have many csv file, but I would like to delete lines with some values in a column conditionally. My example look like this, ex1e, ex2g, ex39, pasg, ssg, mrlc, pc, kb, coop -112, -53, -177, 64, 62, 71, 1, 487, 20 -101, -61, -53, 0, 32767, 51, 0, ... (6 Replies)
Discussion started by: Jae
6 Replies
Login or Register to Ask a Question