07-16-2004
Huge (repeated Entry) text files
Somebody HELP!
I have a huge log file (TEXT) 76298035 bytes.
It's a logfile of IMEIs and IMSIS that I get from my EIR node.
Here is how the contents of the file look like:
000000,
1 33016382000913 652020100423994
1 33016382002353 652020100430743
1 33017035101003 652020100441736
....
....
....
235800,
1 35725620987678 652020100545862
Problem is, the file is to some degree made huge by repeated entries ( repeated lines - non consecutive).
I have tried this code to eliminate the repeated entries:
cat myfile | sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P' | tee mynewfile | wc -l
but it takes forever and stops midway, at 024000 instead of 235800.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi expert,
I am using C shell. And i trying to delete repeated word.
Example file.txt:
BLUE
YELLOW
RED
VIOLET
RED
RED
BLUE
WHITE
YELLOW
BLACK
and i wan store the output into a new file:
BLUE (6 Replies)
Discussion started by: vincyoxy
6 Replies
2. Shell Programming and Scripting
Hi,
I need to extract data from a text file in which data has a pattern. I need to extract all repeated pattern and then save it to different files.
example:
input is:
ST*867*000352214
BPT*00*1000352214*090311
SE*1*1
ST*867*000352215
BPT*00*1000352214*090311
SE*1*2
... (5 Replies)
Discussion started by: apjneeraj
5 Replies
3. UNIX for Advanced & Expert Users
I have the following situation:
a text file with 50000 string patterns:
abc2344536
gvk6575556
klo6575556
....
and 3 text files each with more than 1 million lines:
...
000000 abc2344536 46575 0000
000000 abc2344536 46575 4444
000000 abc2344555 46575 1234
...
I... (8 Replies)
Discussion started by: andy2000
8 Replies
4. Shell Programming and Scripting
I have this 2 files:
k5login
sanwar@systems.nyfix.com
jjamnik@systems.nyfix.com
nisha@SYSTEMS.NYFIX.COM
rdpena@SYSTEMS.NYFIX.COM
service/backups-ora@SYSTEMS.NYFIX.COM
ivanr@SYSTEMS.NYFIX.COM
nasapova@SYSTEMS.NYFIX.COM
tpulay@SYSTEMS.NYFIX.COM
rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies
5. Shell Programming and Scripting
I have a text file where I need to find the string = ST*850*
This string is repetaed several times in the file, so I need to know how many times it appears in the file, this is the text files:
ISA*00* *00* *08*925485USNR *ZZ*IMSALADDERSP... (13 Replies)
Discussion started by: cucosss
13 Replies
6. Shell Programming and Scripting
Hi,
I need to correct line breaks for huge files (more than 1MM records in a file) and then format it properly.
Except the header and trailer, each record starts with 'D'.
Requirement:Scan the whole file except the header and trailer records and see if any of the records start with... (19 Replies)
Discussion started by: kikionline
19 Replies
7. Shell Programming and Scripting
Please can you help in providing the most repeated entry in the 2nd column and give its count
Here is an input file
1, This , is a forum
2, This , is a forum
1, There , is a forum
2, This , is not right
Here the most repeated entry is "This" and count is 3
So output... (4 Replies)
Discussion started by: necro98
4 Replies
8. Shell Programming and Scripting
Hi all,
I want to remove the remove bracket sign ( ) and put in the separate column I also want to remove the repeated entry like in first row in below input (PA156) is repeated
ESR1 (PA156) leflunomide (PA450192) (PA156) leflunomide (PA450192)
CHST3 (PA26503) docetaxel... (2 Replies)
Discussion started by: manigrover
2 Replies
9. Shell Programming and Scripting
Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it
Input
fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies
10. UNIX for Beginners Questions & Answers
Dears
i want to extract lines only that have first entry repeated 3 times or above , ex data :
-bash-3.00$ cat INTCONT-IS.CSV
M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50
M205-00-106_AMDRN:1-0-23-17,12-616-0462,intContact,2016-11-15 02:32:23,50... (5 Replies)
Discussion started by: is2_egypt
5 Replies
LEARN ABOUT CENTOS
grep-changelog
grep-changelog(1) General Commands Manual grep-changelog(1)
NAME
grep-changelog - print ChangeLog entries matching criteria
SYNOPSIS
grep-changelog [options] [CHANGELOG...]
DESCRIPTION
grep-changelog searches the named CHANGELOGs (by default files matching the regular expressions ChangeLog and ChangeLog.[0-9]+) for
entries matching the specified criteria. At least one option or file must be specified. This program is distributed with GNU Emacs.
OPTIONS
The program accepts unambiguous abbreviations for option names.
--author=AUTHOR
Print entries whose author matches regular expression AUTHOR.
--text=TEXT
Print entries whose text matches regular expression TEXT.
--exclude=TEXT
Exclude entries matching regular expression TEXT.
--from-date=YYYY-MM-DD
Only consider entries made on or after the given date. ChangeLog date entries not in the "YYYY-MM-DD" format are never matched.
--to-date=YYYY-MM-DD
Only consider entries made on or before the given date.
--rcs-log
Print output in a format suitable for RCS log entries. This format removes author lines, leading spaces, and file names.
--with-date
In RCS log format, print short dates.
--reverse
Show matches in reverse order.
--version
Display version information.
--help Display basic usage information.
COPYING
Copyright (C) 2008-2013 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this document provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of this document under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this document into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation.
grep-changelog(1)