Using grep -f works, but with two files as large as indicated will take its (serious) time, and may eventually run out of memory. Try
, then use the resultant file in similar way (uniq -u) to extract unique values from either original file.
Comparison of both approaches on ~20k files: EDIT: Times spent for two files with roughly 4E6 entries each, and about 1E6 lines overlap (on a two processor linux host):
Hello
I do want to write a script which will check any errors say "-error" in the log file then have to send email to the concern person . And the concern person will correct the error .
Next time if the script runs eventhough the error has been corrected it will ... (1 Reply)
Hi all,
I'm having some trouble with a shell script that I have put together to search our web pages for links to PDFs.
The first thing I did was:
ls -R | grep .pdf > /tmp/dave_pdfs.outWhich generates a list of all of the PDFs on the server. For the sake of arguement, say it looks like... (8 Replies)
I have a file that is 20 - 80+ MB in size that is a certain type of log file.
It logs one of our processes and this process is multi-threaded. Therefore the log file is kind of a mess. Here's an example:
The logfile looks like: "DATE TIME - THREAD ID - Details", and a new file is created... (4 Replies)
I need help in the following script. I want to grep the sql errors insert into the error table and exit the shell script if there is any error, otherwise keep running the scripts.
Here is my script
#!/bin/csh -f
source .orapass
set user = $USER
set pass = $PASS
cd /opt/data/scripts
echo... (2 Replies)
Hi guys - below is my script that is checking for current file, size and timestamp.
However I added a "grep" feature in it (line in red), but not getting the desired result.
I am trying to acheive in output:
1. Show me the file name, timestamp, size and grep'ed words
It would be a... (2 Replies)
Hi all,
I have problem with searching hundreds of CSV files, the problem is that search is lasting too long (over 5min).
Csv files are "," delimited, and have 30 fields each line, but I always grep same 4 fields - so is there a way to grep just those 4 fields to speed-up search.
Example:... (11 Replies)
Hello everybody,
I'm still slowly treading my way into bash scripting (without any prior programming experience) and hence my code is mostly what some might call "creative" if they meant well :D
I have created a script that serves its purpose but it does so very slowly, since it needs to work... (4 Replies)
This is my first experience writing unix script. I've created the following script. It does what I want it to do, but I need it to be a lot faster. Is there any way to speed it up?
cat 'Tax_Provision_Sample.dat' | sort | while read p; do fn=`echo $p|cut -d~ -f2,4,3,8,9`; echo $p >> "$fn.txt";... (20 Replies)
Hi,
I've written a ksh script that read a file and parse/filter/format each line. The script runs as expected but it runs for 24+ hours for a file that has 2million lines. And sometimes, the input file has 10million lines which means it can be running for more than 2 days and still not finish.... (9 Replies)
Hello experts,
we have input files with 700K lines each (one generated for every hour). and we need to convert them as below and move them to another directory once.
Sample INPUT:-
# cat test1
1559205600000,8474,NormalizedPortInfo,PctDiscards,0.0,Interface,BG-CTA-AX1.test.com,Vl111... (7 Replies)
Discussion started by: prvnrk
7 Replies
LEARN ABOUT MOJAVE
uniq
UNIQ(1) BSD General Commands Manual UNIQ(1)NAME
uniq -- report or filter out repeated lines in a file
SYNOPSIS
uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]]
DESCRIPTION
The uniq utility reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file. If
input_file is a single dash ('-') or absent, the standard input is read. If output_file is absent, standard output is used for output. The
second and succeeding copies of identical adjacent input lines are not written. Repeated lines in the input will not be detected if they are
not adjacent, so it may be necessary to sort the files first.
The following options are available:
-c Precede each output line with the count of the number of times the line occurred in the input, followed by a single space.
-d Only output lines that are repeated in the input.
-f num Ignore the first num fields in each input line when doing comparisons. A field is a string of non-blank characters separated from
adjacent fields by blanks. Field numbers are one based, i.e., the first field is field one.
-s chars
Ignore the first chars characters in each input line when doing comparisons. If specified in conjunction with the -f option, the
first chars characters after the first num fields will be ignored. Character numbers are one based, i.e., the first character is
character one.
-u Only output lines that are not repeated in the input.
-i Case insensitive comparison of lines.
ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE and LC_CTYPE environment variables affect the execution of uniq as described in environ(7).
EXIT STATUS
The uniq utility exits 0 on success, and >0 if an error occurs.
COMPATIBILITY
The historic +number and -number options have been deprecated but are still supported in this implementation.
SEE ALSO sort(1)STANDARDS
The uniq utility conforms to IEEE Std 1003.1-2001 (``POSIX.1'') as amended by Cor. 1-2002.
HISTORY
A uniq command appeared in Version 3 AT&T UNIX.
BSD December 17, 2009 BSD