Sponsored Content
Operating Systems Solaris Huge (repeated Entry) text files Post 53483 by axl on Friday 16th of July 2004 02:20:24 AM
Old 07-16-2004
Huge (repeated Entry) text files

Somebody HELP!

I have a huge log file (TEXT) 76298035 bytes.

It's a logfile of IMEIs and IMSIS that I get from my EIR node.

Here is how the contents of the file look like:

000000,
1 33016382000913 652020100423994
1 33016382002353 652020100430743
1 33017035101003 652020100441736
....
....
....
235800,
1 35725620987678 652020100545862


Problem is, the file is to some degree made huge by repeated entries ( repeated lines - non consecutive).

I have tried this code to eliminate the repeated entries:

cat myfile | sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' | tee mynewfile | wc -l

but it takes forever and stops midway, at 024000 instead of 235800.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete repeated word in text file

Hi expert, I am using C shell. And i trying to delete repeated word. Example file.txt: BLUE YELLOW RED VIOLET RED RED BLUE WHITE YELLOW BLACK and i wan store the output into a new file: BLUE (6 Replies)
Discussion started by: vincyoxy
6 Replies

2. Shell Programming and Scripting

Extract multiple repeated data from a text file

Hi, I need to extract data from a text file in which data has a pattern. I need to extract all repeated pattern and then save it to different files. example: input is: ST*867*000352214 BPT*00*1000352214*090311 SE*1*1 ST*867*000352215 BPT*00*1000352214*090311 SE*1*2 ... (5 Replies)
Discussion started by: apjneeraj
5 Replies

3. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

4. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies

5. Shell Programming and Scripting

How to find repeated string in a text file

I have a text file where I need to find the string = ST*850* This string is repetaed several times in the file, so I need to know how many times it appears in the file, this is the text files: ISA*00* *00* *08*925485USNR *ZZ*IMSALADDERSP... (13 Replies)
Discussion started by: cucosss
13 Replies

6. Shell Programming and Scripting

How to fix line breaks format text for huge files?

Hi, I need to correct line breaks for huge files (more than 1MM records in a file) and then format it properly. Except the header and trailer, each record starts with 'D'. Requirement:Scan the whole file except the header and trailer records and see if any of the records start with... (19 Replies)
Discussion started by: kikionline
19 Replies

7. Shell Programming and Scripting

Finding most repeated entry in a column and giving the count

Please can you help in providing the most repeated entry in the 2nd column and give its count Here is an input file 1, This , is a forum 2, This , is a forum 1, There , is a forum 2, This , is not right Here the most repeated entry is "This" and count is 3 So output... (4 Replies)
Discussion started by: necro98
4 Replies

8. Shell Programming and Scripting

remove brackets and put it in a column and remove repeated entry

Hi all, I want to remove the remove bracket sign ( ) and put in the separate column I also want to remove the repeated entry like in first row in below input (PA156) is repeated ESR1 (PA156) leflunomide (PA450192) (PA156) leflunomide (PA450192) CHST3 (PA26503) docetaxel... (2 Replies)
Discussion started by: manigrover
2 Replies

9. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

10. UNIX for Beginners Questions & Answers

Export lines that have first entry repeated 5 times or above

Dears i want to extract lines only that have first entry repeated 3 times or above , ex data : -bash-3.00$ cat INTCONT-IS.CSV M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50 M205-00-106_AMDRN:1-0-23-17,12-616-0462,intContact,2016-11-15 02:32:23,50... (5 Replies)
Discussion started by: is2_egypt
5 Replies
LOGSAVE(8)                                                    System Manager's Manual                                                   LOGSAVE(8)

NAME
logsave - save the output of a command in a logfile SYNOPSIS
logsave [ -asv ] logfile cmd_prog [ ... ] DESCRIPTION
The logsave program will execute cmd_prog with the specified argument(s), and save a copy of its output to logfile. If the containing directory for logfile does not exist, logsave will accumulate the output in memory until it can be written out. A copy of the output will also be written to standard output. If cmd_prog is a single hyphen ('-'), then instead of executing a program, logsave will take its input from standard input and save it in logfile logsave is useful for saving the output of initial boot scripts until the /var partition is mounted, so the output can be written to /var/log. OPTIONS
-a This option will cause the output to be appended to logfile, instead of replacing its current contents. -s This option will cause logsave to skip writing to the log file text which is bracketed with a control-A (ASCII 001 or Start of Header) and control-B (ASCII 002 or Start of Text). This allows progress bar information to be visible to the user on the console, while not being written to the log file. -v This option will make logsave to be more verbose in its output to the user. AUTHOR
Theodore Ts'o (tytso@mit.edu) SEE ALSO
fsck(8) E2fsprogs version 1.44.1 March 2018 LOGSAVE(8)
All times are GMT -4. The time now is 07:44 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy