Sponsored Content
Operating Systems Solaris Huge (repeated Entry) text files Post 53483 by axl on Friday 16th of July 2004 02:20:24 AM
Old 07-16-2004
Huge (repeated Entry) text files

Somebody HELP!

I have a huge log file (TEXT) 76298035 bytes.

It's a logfile of IMEIs and IMSIS that I get from my EIR node.

Here is how the contents of the file look like:

000000,
1 33016382000913 652020100423994
1 33016382002353 652020100430743
1 33017035101003 652020100441736
....
....
....
235800,
1 35725620987678 652020100545862


Problem is, the file is to some degree made huge by repeated entries ( repeated lines - non consecutive).

I have tried this code to eliminate the repeated entries:

cat myfile | sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' | tee mynewfile | wc -l

but it takes forever and stops midway, at 024000 instead of 235800.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete repeated word in text file

Hi expert, I am using C shell. And i trying to delete repeated word. Example file.txt: BLUE YELLOW RED VIOLET RED RED BLUE WHITE YELLOW BLACK and i wan store the output into a new file: BLUE (6 Replies)
Discussion started by: vincyoxy
6 Replies

2. Shell Programming and Scripting

Extract multiple repeated data from a text file

Hi, I need to extract data from a text file in which data has a pattern. I need to extract all repeated pattern and then save it to different files. example: input is: ST*867*000352214 BPT*00*1000352214*090311 SE*1*1 ST*867*000352215 BPT*00*1000352214*090311 SE*1*2 ... (5 Replies)
Discussion started by: apjneeraj
5 Replies

3. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

4. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies

5. Shell Programming and Scripting

How to find repeated string in a text file

I have a text file where I need to find the string = ST*850* This string is repetaed several times in the file, so I need to know how many times it appears in the file, this is the text files: ISA*00* *00* *08*925485USNR *ZZ*IMSALADDERSP... (13 Replies)
Discussion started by: cucosss
13 Replies

6. Shell Programming and Scripting

How to fix line breaks format text for huge files?

Hi, I need to correct line breaks for huge files (more than 1MM records in a file) and then format it properly. Except the header and trailer, each record starts with 'D'. Requirement:Scan the whole file except the header and trailer records and see if any of the records start with... (19 Replies)
Discussion started by: kikionline
19 Replies

7. Shell Programming and Scripting

Finding most repeated entry in a column and giving the count

Please can you help in providing the most repeated entry in the 2nd column and give its count Here is an input file 1, This , is a forum 2, This , is a forum 1, There , is a forum 2, This , is not right Here the most repeated entry is "This" and count is 3 So output... (4 Replies)
Discussion started by: necro98
4 Replies

8. Shell Programming and Scripting

remove brackets and put it in a column and remove repeated entry

Hi all, I want to remove the remove bracket sign ( ) and put in the separate column I also want to remove the repeated entry like in first row in below input (PA156) is repeated ESR1 (PA156) leflunomide (PA450192) (PA156) leflunomide (PA450192) CHST3 (PA26503) docetaxel... (2 Replies)
Discussion started by: manigrover
2 Replies

9. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

10. UNIX for Beginners Questions & Answers

Export lines that have first entry repeated 5 times or above

Dears i want to extract lines only that have first entry repeated 3 times or above , ex data : -bash-3.00$ cat INTCONT-IS.CSV M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50 M205-00-106_AMDRN:1-0-23-17,12-616-0462,intContact,2016-11-15 02:32:23,50... (5 Replies)
Discussion started by: is2_egypt
5 Replies
grep-changelog(1)					      General Commands Manual						 grep-changelog(1)

NAME
grep-changelog - print ChangeLog entries matching criteria SYNOPSIS
grep-changelog [options] [CHANGELOG...] DESCRIPTION
grep-changelog searches the named CHANGELOGs (by default files matching the regular expressions ChangeLog and ChangeLog.[0-9]+) for entries matching the specified criteria. At least one option or file must be specified. This program is distributed with GNU Emacs. OPTIONS
The program accepts unambiguous abbreviations for option names. --author=AUTHOR Print entries whose author matches regular expression AUTHOR. --text=TEXT Print entries whose text matches regular expression TEXT. --exclude=TEXT Exclude entries matching regular expression TEXT. --from-date=YYYY-MM-DD Only consider entries made on or after the given date. ChangeLog date entries not in the "YYYY-MM-DD" format are never matched. --to-date=YYYY-MM-DD Only consider entries made on or before the given date. --rcs-log Print output in a format suitable for RCS log entries. This format removes author lines, leading spaces, and file names. --with-date In RCS log format, print short dates. --reverse Show matches in reverse order. --version Display version information. --help Display basic usage information. COPYING
Copyright (C) 2008-2013 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this document provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this document under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this document into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. grep-changelog(1)
All times are GMT -4. The time now is 05:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy