Want to remove all lines but not latest 50 lines from a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Want to remove all lines but not latest 50 lines from a file
# 15  
Old 10-26-2013
@Scrutinizer, @Don Cragun and @alister. You have reason.
Code:
$> strace sed -ni '2,$p' file
....
open("file", O_RDONLY)                  = 3
...
open("./sedvS2Dny", O_RDWR|O_CREAT|O_EXCL, 0600) = 4
...
write(4, "SIsxCwUouy\n", 11)            = 11
write(4, "qWngJUOkrc\n", 11)            = 11
...
rename("./sedvS2Dny", "file")           = 0
...

Emanuele
# 16  
Old 10-26-2013
Hi, Scrutinizer.
Quote:
Originally Posted by Scrutinizer
If you really can't tail to another filesystem first, perhaps:
Code:
ex -sc '$-49999,$d | x' file

...
Some versions of ex/ed will use a temporary file: see post #4 at Delete first 100 lines from a BIG File - The UNIX and Linux Forums

( From that post, you and I went on to discuss temporary files as a kind of safety issue. )

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
# 17  
Old 10-26-2013
In-place move to the top of the file of the final 5 lines without using any variables or temp files:
Code:
dd if=/dev/null of=fname bs=1 seek=$(tail -n5 fname | tee >(wc -c) 1<>fname)

Note: The process substitution implementations I've seen require either /dev/fd or named pipes. If /dev/fd isn't available, and if the shell cannot create the fifo in its usual temp dir, the shell may need to be informed of a suitable alternative location.

Regards,
Alister

Last edited by alister; 10-26-2013 at 06:26 PM.. Reason: Corrected text to match tail's five line count
These 3 Users Gave Thanks to alister For This Post:
# 18  
Old 10-26-2013
Hi Alister, that looks ingenious Smilie . I presume you mean tail -n 50000 and wc -m? Could you elaborate why the file needs be redirected read/write on stdout?

I tried it on Linux with tail -n 5 and this works fine,

But on OSX 10.9 (bash 3 and bash 4) I got:
Code:
dd: no value specified for seek

and the file ended up consisting of the last 5 lines an empty line and the last 5 lines again.

On Solaris (bash 3 and using XPG4 utilities):
Code:
dd: bad argument: "11"

On HPUX (bash 4)
Code:
0+0 records in
0+0 records out

And the file became 0 length

On AIX:
Code:
dd: 0511-056 The command parameter 11 is not correct.
Usage: dd [if=InputFile] [of=OutputFile] [cbs=Number] [fskip=Number]
          [skip=Number] [seek=Number] [count=Number] [bs=Number] [span=yes|no]
          [ibs=Number] [obs=Number] [files=Number] [conv=Parameter[, ...]]


Last edited by Scrutinizer; 10-26-2013 at 06:43 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 19  
Old 10-26-2013
Hi.

Both of the solutions, shell variable and "dd move" are about the same as far as system resources go. I used alister's amended solution; his first failed (the one that went out with an email notification).

For a file of 14,754,910 lines, about 1 GB, the times for the shell solution keeping the last 50,000 lines were:
Code:
real	0m36.738s
user	0m1.624s
sys	0m8.321s

and for the clever "dd move" solution:
Code:
real	0m44.048s
user	0m1.408s
sys	0m8.725s

There were 2 wc executions for verification in both runs. Leaving those out:
Code:
real	0m29.443s
user	0m0.560s
sys	0m7.012s

and
Code:
real	0m30.536s
user	0m0.220s
sys	0m6.460s

The results actually surprised me -- I thought the shell would be slower. I saw paging being used a few times, slightly more often with the dd. The shell was run first, so cache advantage, if any, went to dd ... cheers, drl

This system:
Code:
 OS: Debian 5.0.8 lenny
 Kernel: x86_64 Linux 2.6.26-2-amd64
 Uptime: 59d 21h 58m
 CPU: AMD Athlon 64 3000+ @ 1.802GHz
 GPU: NVidia GeForce FX 5200
 RAM: 1170MB / 3024MB


Last edited by drl; 10-26-2013 at 06:39 PM.. Reason: Typo.
This User Gave Thanks to drl For This Post:
# 20  
Old 10-26-2013
Hi, Scrutinzer.

Thank you for catching the line count mismatch. My testing involved moving just the last 5 lines of a 15 line file. I changed the text to match -n5.

No, I definitely did not intend wc -m. dd will seek bs*seek bytes, not characters.

Thank you for testing this construct on so many platforms. It appears that all of the failures are the result of leading blank(s) emitted by most wc implementations but not by GNU wc (with which I tested). This hypothesis is consistent with your error messages and supported by a quick peek at code.
From a BSD wc implementation used by OS X:
Code:
	if (dochar || domulti) {
		tcharct += charct;
		(void)printf(" %7ju", charct);
	}

Perhaps you could confirm by using tr to delete any spaces?
Code:
dd if=/dev/null of=fname bs=1 seek=$(tail -n5 fname | tee >(wc -c | tr -d ' ') 1<>fname)

Alternatively, depending on how dd converts the text to an int, leading blanks might not be a problem if protected from shell parsing. Perhaps simply double quoting the command substitution will do (although this feels fragile):
Code:
dd if=/dev/null of=fname bs=1 seek="$(tail -n5 fname | tee >(wc -c) 1<>fname)"

The read/write nature of tee's stdout is not relevant. The utility of <> in this case is that it leaves the file descriptor's offset at 0 and allows tail's output (via tee) to write to the beginning of the file without truncation (which dd will perform afterwards). >> and > are both unsuitable since the former appends all writes and the latter truncates before the first write.

Regards,
Alister

---------- Post updated at 06:20 PM ---------- Previous update was at 06:02 PM ----------

Quote:
Originally Posted by drl
The results actually surprised me -- I thought the shell would be slower.
I'm not surprised. My command substitution, process substitution, and the pipeline within it require more work to establish, and, once running, require more context switching to move data around.

One advantage of using all those pipes is that memory consumption is not a function of the amount of data to be moved.

That said, neither performance nor resource consumption motivated me. I was only trying to see if I could accomplish it without reading the file twice and without explicit memory storage.

Quote:
Originally Posted by drl
Code:
v1=$( tail -3 $FILE )
rm $FILE
echo "$v1" > $FILE

Using echo with arbitrary text can produce unexpected results. It's best to use printf '%s\n' "$v1".

More importantly, that solution cannot handle trailing blank lines properly, since command substitution always strips them. This shortcoming may be perfectly acceptable in some situations and an utter dealbreaker in others.
Code:
$ printf '%s\n' 1 2 3 '' '' '' | wc -l
6

$ v1=$(printf '%s\n' 1 2 3 '' '' '')

$ echo "$v1" | wc -l
3

Regards,
Alister

Last edited by alister; 10-26-2013 at 07:40 PM..
These 2 Users Gave Thanks to alister For This Post:
# 21  
Old 10-26-2013
Hi, alister.
Quote:
Originally Posted by alister
Using echo with arbitrary text can produce unexpected results. It's best to use printf '%s\n' "$v1" .
Thanks for the reminder. I usually use my function:
Code:
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }

print like echo, because the name is shrtr, and uses printf -- but sometimes I forget ... cheers, drl
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

Remove lines that are subsets of other lines in File

Hello everyone, Although it seems easy, I've been stuck with this problem for a moment now and I can't figure out a way to get it done. My problem is the following: I have a file where each line is a sequence of IP addresses, example : 10.0.0.1 10.0.0.2 10.0.0.5 10.0.0.1 10.0.0.2... (5 Replies)
Discussion started by: MisterJellyBean
5 Replies

3. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

4. Shell Programming and Scripting

Remove lines from file

Hey Gang- I have a list of servers. I want to exclude servers that begin with and end with certain characters. Is there an easy command to do this? Example wvm1234dev wvm1234pro uvm1122dev uvm1122bku uvm1344dev I want to exclude any lines that start with "wvm" OR "uvm" AND end... (7 Replies)
Discussion started by: idiotboy
7 Replies

5. Shell Programming and Scripting

remove blank lines and merge lines in shell

Hi, I'm not a expert in shell programming, so i've come here to take help from u gurus. I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command. I've a datatable csv of the below format - --in file format xx,xx,xx ,xx , , , , ,,xx, xxxx,, ,, xxx,... (11 Replies)
Discussion started by: dvah
11 Replies

6. Shell Programming and Scripting

remove : lines from file

A small question I have a test.txt file I have contents as: a:google b:yahoo : c:facebook : d:hotmail How do I remove the line with : my output should be a:google b:yahoo c:facebook d:hotmail (5 Replies)
Discussion started by: aronmelon
5 Replies

7. Shell Programming and Scripting

remove lines from file

Hi gurus, i'm trying to remove a number of lines from a large file using the following command: sed '1,5000d' oldfile > newfile Somehow the lines in the old file are not deleted... Am I doing this wrongly? Any suggestions? :confused: Thanks! :) wee (10 Replies)
Discussion started by: lweegp
10 Replies

8. UNIX for Dummies Questions & Answers

vi to remove lines in file

All, I have a text file with several entries like below: personname personname.domain.com I know there is a way to use vi to remove only the personname.domain.com line. Can someone help? I believe that it involves /s/g/ something...I just can't remember the exact syntax. Thanks (2 Replies)
Discussion started by: kjbaumann
2 Replies

9. Shell Programming and Scripting

To remove the lines in my file

Hi, There seems to some hack attempts in my site. I have attached the index page of my site and I need to remove the below lines from the index page. The below lines are at the center of the file. --> </style> <script>E V A L( unescape(... (5 Replies)
Discussion started by: gsiva
5 Replies

10. Shell Programming and Scripting

remove lines from file

file: 1 xxxxxxx 2 xxx xxx 5 xxx xxx ... 180 xxxxxx 200 xxx how to remove any lines with the first number range 1-180 (9 Replies)
Discussion started by: bluemoon1
9 Replies
Login or Register to Ask a Question