Splitting files using awk and reading filename value from input data
I have a process that requires me to read data from huge log files and find the most recent entry on a per-user basis. The number of users may fluctuate wildly month to month, so I can't code for it with names or a set number of variables to capture the data, and the files are large so I don't want to read the it several times.
The entries of interest have a particular string so I can extract just them from the overall log file and I have a way to split the output into separate files on a per-user basis, my plan being to then just read the last line of each files created with tail -1 and the filename giving me the user account in question.
My boss, however, worries about false-positive data matches for my expression (by chance or maliciously) that might try to overwrite a critical file.
My data has a syslog-type date in it which means doing a sort -u is proving tricky too. I've got this far with splitting the data out to files under /tmp/logs as splitlog.rbatte1 or similar but if field 11 were ever */../../etc/passwd then potentially I would be in trouble.
The date is the first three fields and 'as far as I am aware' a valid user name would be in field 11, but ........
A simplified part of the code would be:-
I have considered adding tr -d "\/" to strip out the characters, but now that it's been raised, I'm concerned that there may be other things I'm not considering.
Is there a better way to work here, potentially with awk getting the equivalent of basename "$11" or variable substitution in the shell of "${{11}##*/}"?
Any suggestions welcome. Perhaps there is a better design overall that will find the last entry on a per-user basis. The log is thankfully written in time order, so the last in the file by user name is the last by time already.
Hello All.
I am having a directory /tmp/rahul which contains many files in the format
@#home@#rahul@#programs@#script.pl
where /home/rahul/programs is the directory where the script.pl file is to be placed.
I have many files in this format.
What i want is a script which read these... (7 Replies)
Hye all,
I would like some help with reading in a file in which the data is seperated by commas. for instance:
input.dat:
1,2,34,/test
for the above case, the fn. will store the values into an array -> data as follows:
data = 1
data = 2
data = 34
data = /test
I am trying to write... (5 Replies)
I'm reading 2 input files but not getting expected value.
I should get an alpha value on file_1_data but not getting any.
Please help.
>cat test6.sh
awk '
FILENAME==ARGV { file_1_data=$0; print "----- 1 Line " NR " -----" $1; next }
FILENAME==ARGV { file_2_data=$0; print "----- 2... (1 Reply)
Hi,
I needs to split *.txt files from single directory depends on the some mutltiple input values. i have wrote the code like below
for file in *.txt
do
grep -i -h "value1|value2" $file > $file;
done.
My requirment is more input values needs to be given in grep; let us say 50... (3 Replies)
Hi guys,
I am new to AWK and unix scripting. Please see below my problem and let me know if anyone you can help.
I have 2 input files (example given below)
Input file 2 is a standard file (it will not change) and we have to get the name (second column after comma) from it and append it... (5 Replies)
Hi all,
I have a list of xml file. I need to split the files to a different files when see the <ko> tag.
The list of filename are
B20090908.1100-20090908.1200_CDMA=1,NO=2,SITE=3.xml
B20090908.1200-20090908.1300_CDMA=1,NO=2,SITE=3.xml
B20090908.1300-20090908.1400_CDMA=1,NO=2,SITE=3.xml
... (3 Replies)
Hi ,
I am receiving a CSV file that can vary in number of rows each time.
I am supposed to split this file into 3 separate files like this:
1. create a file named 'File1.csv' that will contain first 3 rows of the input file
2. create file named 'File2.csv' that will contain last 3 rows of the... (7 Replies)
Hello All,
I have a comma delimiter file with 10 columns. I took the desired data but from $4 I need to split into two columns as 3+7 bytes.
awk -F"," -v OFS=',' '{print $2,$3,$4}' foo.txt
42366,11/10/2014,5012418769
42366,11/10/2014,2046955672
42366,11/10/2014,2076802951
... (3 Replies)
Hello,
I am running under ubuntu1 14.04 and I have a script which is sending given process names to vanish so that I'd see less output when I run most popular tools like top etc in terminal window. In usual method it works.
Whenever I restart the system, I have to enter the same data from... (2 Replies)