Sponsored Content
Top Forums Shell Programming and Scripting Need help improving my script. Post 302970903 by Don Cragun on Wednesday 13th of April 2016 11:18:02 AM
Old 04-13-2016
I was thinking of taking Scrutinizer's suggestions a step further, getting rid of the unneeded GC_ports.txt file completely and just using one awk script to produce the four desired output files (GC_tcpinbound.txt, GC_tcpoutbound.txt, GC_udpinbound.txt, and GC_updoutbound.txt). As Bakunin explained, find -exec command + instead of find -exec command \; reduces the number of times zcat is invoked. I added the, -v option to zcat to get a visible indication that progress is being made while the script runs.

Please remove /var/network_logs/gc/archive/GC_ports.txt if that file is still present from an earlier run of your script. Then see if something more like:
Code:
#!/bin/ksh
InputDir='/var/network_logs/gc/archive'
OutputDir='/home/kenneth.cramer/asa'

# Display the start time...
date +'Search started at: %m/%d/%Y %T%nProcessing asalog files...'

# Find and uncompress asalog* files that are less than a week old...
find "$InputDir"/asalog* -mtime -7 -exec zcat -v {} + |
awk -v OutputDir="$OutputDir" '
!/Built/ || /10.10.120.145/ {
	# Discard lines that do not contain "Built" and lines that contain
	# IP address 10.10.120.145.
	next
}
{	# Throw away unneeded data...
	$0 = $10 OFS $11 OFS $15 OFS $18
	# and change "/"s and ":"s to spaces (recomputing field boundaries).
	gsub("[/:]", " ")
}
$1 == "inbound" {
	# Process inbound records.
	if(seen[$1, $2, $3, $4, $6, $7, $8]++) {
		# Discard duplicates.
		next
	}
	# Following asuumes we only have TCP and UDP inbound records.
	# Print to one of two inbound text files.
	print $2, $3, $4, $6, $7, $8 > (OutputDir "/GC_" \
	    (($2 == "TCP") ? "tcp" : "udp") "inbound.txt")
}
$1 == "outbound" {
	# Process outbound records.
	if(seen[$1, $2, $6, $7, $3, $4, $5]++) {
		# Discard duplicates.
		next
	}
	# Following asuumes we only have TCP and UDP inbound records.
	# Print to one of two outbound text files.
	print $2, $6, $7, $3, $4, $5 > (OutputDir "/GC_" \
	    (($2 == "TCP") ? "tcp" : "udp") "outbound.txt")
}'

# Compress the output files into a single file for transport off the machine...
printf '\nCompressing files for transport...\n'

tar -czvf "$OutputDir/GC_ports.tgz" "$OutputDir"/GC_*.txt

# Print end time and statistics...
date +'%nProcess completed for Gold Camp at: %m/%d/%Y %T'
times

runs a little faster for you.

I know that you said you wanted to use bash, but I generally find that ksh will run scripts like this a little faster. These shells use different output formats for the output from the times built-in utility, but should otherwise produce identical results for this script. (You may want to try both a few times with real data to see how much of a difference in speed there is between bash and ksh on your system.)

When run with InputDir and OutputDir set to "." and with six copies of a compressed version of the sample input you provided in post #3 in files named asalog_test1.Z through asalog_test6.Z, it produces the output files GC_updoutbound.txt containing:
Code:
UDP intmgmt 10.20.100.48 internal 10.20.114.120 53

and the compressed tar archive file GC_ports.tgz and writes the following to standard output and standard error output:
Code:
Search started at: 04/13/2016 07:37:58
Processing asalog files...
./asalog_test1.Z:	   43.4%
./asalog_test2.Z:	   43.4%
./asalog_test3.Z:	   43.4%
./asalog_test4.Z:	   43.4%
./asalog_test5.Z:	   43.4%
./asalog_test6.Z:	   43.4%

Compressing files for transport...
a ./GC_udpoutbound.txt

Process completed for Gold Camp at: 04/13/2016 07:37:58
user	0m0.00s
sys	0m0.00s

while it runs.

While your script from post #3 in this thread (using bash but converted to use files in the current directory) produces the output:
Code:
Search started at:
04/13/2016 07:39:54



Sorting data into proper files.



Compressing files for transport
a ./GC_ports.txt
a ./GC_udpoutbound.txt
Process completed for Gold Camp at:
04/13/2016 07:39:54


0m0.002s 0m0.017s
0m0.011s 0m0.014s

As I said before, I imagine that a good portion of the time in this script is spent decompressing the asalog* files and, depending on the sizes of your four output files, recompressing the data as it creates the compressed archive, but I'm hoping the reduced number of processes running and the reduced number of times the uncompressed data is read and written will make this noticeably faster when you're working with real data.

Note that you provided sample UPD outbound records as sample input data and you showed sample output data for TCP inbound records. So, I'm not sure that I produced the correct output formats for inbound or outbound records (since the output format for inbound records is not the same as the output format for outbound records).

Hope this helps,
- Don
 

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

improving my script (find & replace)

Hi all, I have a script that scan files, find old templet and replace it with new one. #!/bin/ksh file_name=$1 old_templet=$2 new_templet=$3 # Loop through every file like this for file in file_name do cat $file | sed "s/old_templet/new_templet/g" > $file.new #do a global searce and... (8 Replies)
Discussion started by: amir_yosha
8 Replies

2. Shell Programming and Scripting

improving my script

Hi; I want to access our customer database to retreive all clients that have as language index 2 or 3 and take their client number. My input is a file containing all client numbers. i access the data base using a function call "scpshow". The total number of clients i want to scan is 400 000... (6 Replies)
Discussion started by: bcheaib
6 Replies

3. UNIX for Dummies Questions & Answers

Improving Unix Skills

Kindly any advice to improve my unix skills as electronic books i can download or valuable sites as this one etc... (3 Replies)
Discussion started by: sak900354
3 Replies

4. Shell Programming and Scripting

Improving this validate function

Hi guys, I use this function which was provided to me by someone at this site. It works perfectly for validating a users input option against allowed options.. example: validateInput "1" "1 3 4 5" would return 0 (success) function validateInput { input=$1 allowedInput=$2 for... (4 Replies)
Discussion started by: pyscho
4 Replies

5. Shell Programming and Scripting

Improving code by using associative arrays

I have the following code, and I am changing it to #!/bin/bash hasArgumentCModInfile=0 hasArgumentSrcsInfile=0 hasArgumentRcvsInfile=0 OLDIFS="$IFS" IFS="|=" # IFS controls splitting. Split on "|" and "=", not whitespace. set -- $* # Set the positional... (3 Replies)
Discussion started by: kristinu
3 Replies

6. Shell Programming and Scripting

Basic help improving for in loop

I'm obviously very new to this. I'm trying to write a simple for loop that will read the directory names in /Users and then copy a file into the same subdir in each user directory. I have this, and it works but it isn't great. #!/bin/bash HOMEDIRS=/Users/* for dirs in $HOMEDIRS; do if ];... (5 Replies)
Discussion started by: Heath_T
5 Replies

7. Shell Programming and Scripting

Help with improving korn shell script

I am primarily a SQA/Tester and new to korn shell. How can I improve the following script? #/bin/ksh SourceLocation=~/Scripts/Test/Source TrackerLocation=~/Scripts/Test/Tracker TargetLocation=rdbusse@rdbmbp:/Users/rdbusse/Scripts/Test/Target for file in $(cd $SourceLocation; ls) do ... (7 Replies)
Discussion started by: bayouprophet
7 Replies

8. Shell Programming and Scripting

Improving code

Gents, I did the below code to get an output (report) ,.. the code works fine but I believe it can be more shorted using better method. Please if you can help, to generate same output improving the code , will be great. here my code. # get diff in time awk '{$9=$8-prev8;prev8=$8;print... (8 Replies)
Discussion started by: jiam912
8 Replies
All times are GMT -4. The time now is 02:05 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy