Want to remove all lines but not latest 50 lines from a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Want to remove all lines but not latest 50 lines from a file
# 22  
Old 10-26-2013
Quote:
Originally Posted by alister
Hi, Scrutinzer.
[..]
Perhaps you could confirm by using tr to delete any spaces?
Code:
dd if=/dev/null of=fname bs=1 seek=$(tail -n5 fname | tee >(wc -c | tr -d ' ') 1<>fname)

Alternatively, depending on how dd converts the text to an int, leading blanks might not be a problem if protected from shell parsing. Perhaps simply double quoting the command substitution will do (although this feels fragile):
Code:
dd if=/dev/null of=fname bs=1 seek="$(tail -n5 fname | tee >(wc -c) 1<>fname)"

I retested and can confirm that without the leading spaces in the output of wc it now works on all platforms, except HPUX, where the file still became 0 bytes like before.. It did not change anything there since wc does not produce leading spaces there.. I did notice this:
Code:
hpux64$ echo hello | tee >(wc -c) >/dev/null
0

Which on the other platforms produced 6
Quote:
The read/write nature of tee's stdout is not relevant. The utility of <> in this case is that it leaves the file descriptor's offset at 0 and allows tail's output (via tee) to write to the beginning of the file without truncation (which dd will perform afterwards). >> and > are both unsuitable since the former appends all writes and the latter truncates before the first write.
Nice! Smilie




_________
Quote:
Originally Posted by drl
Hi, alister.

Thanks for the reminder. I usually use my function:
Code:
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }

print like echo, because the name is shrtr, and uses printf -- but sometimes I forget ... cheers, drl
There seems to be a difference:
Code:
$ echo 1 "2 3"
1 2 3
$ pe 1 "2 3"
12 3

Wouldn't
Code:
pe() { printf "%s" "$@"; printf "\n"; }

produce the same result?

Perhaps:
Code:
pe() { printf "%s" "$1"; [ $# -gt 1 ] && shift && printf " %s" "$@"; printf "\n"; }

-edit- maybe even this:
Code:
pe () { printf "%s\n" "$*" ; }


Last edited by Scrutinizer; 10-26-2013 at 09:00 PM..
These 2 Users Gave Thanks to Scrutinizer For This Post:
# 23  
Old 10-26-2013
Hi, Scrutinizer.

I don't recall if I had the intention of omitting spaces, but, yes, that's how it works. I may need to add an additional function to handle both. I'll think on that.

The templates I have for setting up the environment, displaying input, posting output, and sometimes doing the comparison now runs into the 50s. Although they are under version control, it can be a pain updating those that are related and have common code. My goal is to encourage the user to actually copy and paste the scripts, possibly staging the data files, running the code, and seeing if it matches my results. That also keeps me honest.

Thanks for the feedback ... cheers, drl
# 24  
Old 10-26-2013
Note that the last paragraph of the description section of the current POSIX Standard's wc utility man page is:
Quote:
Tails relative to the end of the file may be saved in an internal buffer, and thus may be limited in length. Such a buffer, if any, shall be no smaller than {LINE_MAX}*10 bytes.
so, tail -n 50000 is likely to fail (i.e., give some number between 10 and 50,000 lines from the end of the file with a good chance that the first line would be missing one or more bytes from the start of that line) without warning on some systems. I'm almost positive that is the way it worked on UNIX System V Release 4's tail utility and I have no idea how many versions of tail that were originally derived from that source have been modified to handle unlimited numbers of bytes or lines when grabbing data from the end of a file.
# 25  
Old 10-27-2013
Hi.

For the version of tail that I used, as noted above:
Code:
       If the first character of N (the number of bytes or lines) is a `+',
       print beginning with the Nth item from the start of each file,
       otherwise, print the last N items in the file.  N may have a multiplier
       suffix: b 512, kB 1000, K 1024, MB 1000*1000, M 1024*1024, GB
       1000*1000*1000, G 1024*1024*1024, and so on for T, P, E, Z, Y.

so it looks like the GNU folks have made provisions for some large quantities of data. I have not looked at the source, however.

For the tests that I ran, I numbered all the lines, counted the final tail (50,000 of 14,000,000), and printed the first and last line of the output file. All seemed correct and intact. The system tail and the version in perl at http://cpansearch.perl.org/src/CWEST/ppt-0.14/bin/tail both return essentially immediately when asked for the last 5 lines of the 14M line file. I noticed some seeks in the perl version, so I'd guess that both those versions are using such performance techniques.

I don't disagree that some utilities will produce wrong results without warning, but that's not the right thing to do, and those systems should be avoided if possible, and especially if one will normally try to solve problems such as we are addressing here. Another possibility is to use the perl version, although I didn't rigorously test it (and the calling sequence is slightly different, but easily changed).

I might worry more about the shell capacity, although I didn't have any trouble, and I recall a test I did on an early version of SunOS, and was pleasantly surprised at its capacity for variable storage. Of course, the MMV for specific cases.

Best wishes ... cheers, drl

Last edited by drl; 10-27-2013 at 09:48 AM.. Reason: Minor typo.
# 26  
Old 10-28-2013
For discussion:
Code:
tac < file | head -5 | tac | tee file >/dev/null

will keep the data stream within the pipes, by the input redirection file is opened for read early by the shell, before tee will write to it. The whole thing is pretty fast, although it will read the entire file. It keeps the inode of file. I'm not sure if and where the caveats are. I tested it with 10 million lines to get them down to 5000 on my Linux system.
# 27  
Old 10-28-2013
Hi, RudiC.
Quote:
Originally Posted by RudiC
For discussion:
Code:
tac < file | head -5 | tac | tee file >/dev/null

...
Using the data and framework in the previous posts (14M lines, about 1 GB), I get:
Code:
tac: standard input: read error: Inappropriate ioctl for device
  30442 2334740 data1

not 50000 as desired.

The timing was:
Code:
real	0m35.072s
user	0m2.732s
sys	0m10.137s

The number of lines in the result varied, of 4 times, it did get 50000 once, the other 3 were 27K, 29K, and 30K, all of the latter with the ioctl message.

This is just for this 3GB workstation, a single datapoint.

Best wishes ... cheers, drl

Last edited by drl; 11-02-2013 at 02:27 PM.. Reason: Minor typo.
This User Gave Thanks to drl For This Post:
# 28  
Old 10-29-2013
Quote:
Originally Posted by RudiC
For discussion:
Code:
tac < file | head -5 | tac | tee file >/dev/null

will keep the data stream within the pipes, by the input redirection file is opened for read early by the shell, before tee will write to it. The whole thing is pretty fast, although it will read the entire file. It keeps the inode of file. I'm not sure if and where the caveats are. I tested it with 10 million lines to get them down to 5000 on my Linux system.
As drl demonstrated, this solution is not reliable.

The problem with this approach is that we cannot make any assumptions about when tee will truncate the file. The elements of a pipeline need not be created and scheduled sequentially. Even if they are, the number and size of the pipeline's buffers (both in the kernel and userspace) impose an upper limit on the amount of data that can be moved before truncation.
Code:
$ seq 100000 > data
$ wc -l < data
100000

$ cp data data.bkp
$ tac data | tee data >/dev/null
tac: data: read error
$ wc -l < data
24420

$ cp data.bkp data
$ tac data | head -n 100000 | tee data >/dev/null
tac: data: read error
$ wc -l < data
46266

$ cp data.bkp data
$ tac data | head -n 100000 | head -n 100000 | tee data >/dev/null
tac: data: read error
$ wc -l < data
68111

$ cp data.bkp data
$ tac data | head -n 100000 | head -n 100000 | head -n 100000 | tee data >/dev/null
tac: data: read error
$ wc -l < data
98148

With enough buffering, you might get lucky ...
Code:
$ cp data.bkp data
$ tac data | head -n 100000 | head -n 100000 | head -n 100000 | head -n 100000 | tee data >/dev/null
$ wc -l < data
100000

... or not.
Code:
$ cp data.bkp data
$ tac data | head -n 100000 | head -n 100000 | head -n 100000 | head -n 100000 | tee data >/dev/null
tac: data: read error
$ wc -l < data
80399

Regards,
Alister
This User Gave Thanks to alister For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to remove lines that do not start with digit and combine line or lines

I have been searching and trying to come up with an awk that will perform the following on a converted text file (original is a pdf). 1. Since the first two lines are (begin with) text they are removed 2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

Remove lines that are subsets of other lines in File

Hello everyone, Although it seems easy, I've been stuck with this problem for a moment now and I can't figure out a way to get it done. My problem is the following: I have a file where each line is a sequence of IP addresses, example : 10.0.0.1 10.0.0.2 10.0.0.5 10.0.0.1 10.0.0.2... (5 Replies)
Discussion started by: MisterJellyBean
5 Replies

3. Shell Programming and Scripting

Two files, remove lines from second based on lines in first

I have two files, a keepout.txt and a database.csv. They're unsorted, but could be sorted. keepout: user1 buser3 anuser19 notheruser27 database: user1,2343,"information about",field,blah,34 user2,4231,"mo info",etc,stuff,43 notheruser27,4344,"hiya",thing,more thing,423... (4 Replies)
Discussion started by: esoffron
4 Replies

4. Shell Programming and Scripting

Remove lines from file

Hey Gang- I have a list of servers. I want to exclude servers that begin with and end with certain characters. Is there an easy command to do this? Example wvm1234dev wvm1234pro uvm1122dev uvm1122bku uvm1344dev I want to exclude any lines that start with "wvm" OR "uvm" AND end... (7 Replies)
Discussion started by: idiotboy
7 Replies

5. Shell Programming and Scripting

remove blank lines and merge lines in shell

Hi, I'm not a expert in shell programming, so i've come here to take help from u gurus. I'm trying to tailor a csv file that i got to make it work for the LOAD FROM command. I've a datatable csv of the below format - --in file format xx,xx,xx ,xx , , , , ,,xx, xxxx,, ,, xxx,... (11 Replies)
Discussion started by: dvah
11 Replies

6. Shell Programming and Scripting

remove : lines from file

A small question I have a test.txt file I have contents as: a:google b:yahoo : c:facebook : d:hotmail How do I remove the line with : my output should be a:google b:yahoo c:facebook d:hotmail (5 Replies)
Discussion started by: aronmelon
5 Replies

7. Shell Programming and Scripting

remove lines from file

Hi gurus, i'm trying to remove a number of lines from a large file using the following command: sed '1,5000d' oldfile > newfile Somehow the lines in the old file are not deleted... Am I doing this wrongly? Any suggestions? :confused: Thanks! :) wee (10 Replies)
Discussion started by: lweegp
10 Replies

8. UNIX for Dummies Questions & Answers

vi to remove lines in file

All, I have a text file with several entries like below: personname personname.domain.com I know there is a way to use vi to remove only the personname.domain.com line. Can someone help? I believe that it involves /s/g/ something...I just can't remember the exact syntax. Thanks (2 Replies)
Discussion started by: kjbaumann
2 Replies

9. Shell Programming and Scripting

To remove the lines in my file

Hi, There seems to some hack attempts in my site. I have attached the index page of my site and I need to remove the below lines from the index page. The below lines are at the center of the file. --> </style> <script>E V A L( unescape(... (5 Replies)
Discussion started by: gsiva
5 Replies

10. Shell Programming and Scripting

remove lines from file

file: 1 xxxxxxx 2 xxx xxx 5 xxx xxx ... 180 xxxxxx 200 xxx how to remove any lines with the first number range 1-180 (9 Replies)
Discussion started by: bluemoon1
9 Replies
Login or Register to Ask a Question