Actual file has approx 50 Millions such lines with bigger number
Now I need to find a way so that I can pull out all uniq sets
example
Case satisfying conditions
3 2 is uniq
as first column has 3 and 2nd column is "2" only no other 2nd number
also 5 and 1 and it appears only once.
Cases does not work
1,2 etc WHERE 1 has other numbers in 2nd column.
Already tried:
1. Tried database mysql.. Does not work
2. grep and awk : very slow .. My script is running for more than 3 days
3. Sort column and comparing with scnd column ... Need help on any unix/linux tool to do this
4. comm commands also seems to be scared of too much data ...
5. Tried perl with BINMODE and reading a BLOCK etc etc ... Slower than grep and egrep.
Any ideas on how to get this details.....
Last edited by chakrapani; 03-29-2010 at 08:59 AM..
#!/bin/bash
function lookfor {
y=$1
s=$2
ONE="1"
echo -e "NOW Processing file: $s \n\n"
for file in $(cat $s )
do
# Make sure the lenght is bigger of looking number is bigger than 5
if [ "${#file}" -ge 5 ]
then
VAL=$( grep "$file" $y | awk '{ print $2}' | sort | uniq | wc -l )
if [ -n "$VAL" ]
then
if [ "x$VAL" == "x$ONE" ]
then
echo -en "$file"
grep "|$file|" $y >> fnd.txt
fi
fi
echo -en "." # Show some activity
else
echo -en "-" # Show some activity that I rejected this number
fi
done
}
lookfor "hugelog1.log" "firstRowUniq.1" ;
lookfor "hugelog2.log" "firstRowUniq.2" ;
lookfor "hugelog3.log" "firstRowUniq.3" ;
I have broken hugelog in some files and firstRowUniq is the file having uniq
of the first column from hugelog file. in Linux what I mean
Ok what I need is the set to remain same so 1 2 and 1 4 should not be there as 2 and 4 changes but if we have same set all the time it is ok . example 3 2 and 5 1 etc in the original code.
What I was doing in my code was to to get uniq numbers and then grep and the numbers in the file and then see if scnd number appears only once using wc -l
It works but slow ..
Getting a bit more comfortable making quick YT videos in 4K, here is:
Search Engine Optimization | How To Fix Soft 404 Errors and A.I. Tales from Google Search Console
https://youtu.be/I6b9T2qcqFo (0 Replies)
Hi,
I've written a script to search for an Oracle ORA- error on a log file, print that line and the .trc file associated with it as well as the dateline of when I assumed the error occured. In most it is the first dateline previous to the error.
Unfortunately, this is not a fool proof script.... (2 Replies)
Hi all,
I am working on a sample backup code, where i read the files per 7200 bytes and send it to server. Before sending to server, i compress each 7200 bytes using zlib compression algorithm using dictionary max length of 1.5 MB . I find zlib is slow.
Can anyone recommend me a... (3 Replies)
Hi,
We have a unix shell script which tries login to database. The user name and password to connect to database is stored in a file connection.sql.
Now connection.sql has contents
def ora_user =&1
CONNECT A_PROXY/abc123@DEV01
When on UNIX server we connect to database and set spool on... (7 Replies)
Hi ,
I have been trying to write a perl script to do this job. But i am not able to achieve the desired result. Below is my code.
my $current_value=12345;
my @users=("bob","ben","tom","harry");
open DBLIST,"<","/var/tmp/DBinfo";
my @input = <DBLIST>;
foreach (@users)
{
my... (11 Replies)
Hi!
Can someone explain me exactly this technique? Why a process (PARENT) creates a copy of itself with FORK (CHILD)? What's the reason of this behaviour? Sorry, but I cannot understand the logic behind it.
Thanks. (4 Replies)
Is there any better way of doing this? I only want to find a status of a diff, ie diff the file and return to me whether it is different or not or non-existant.
This example works, however I think it could be less messier:
workd=`pwd`;find $workd -name "*.sum" | while read line ; do... (1 Reply)