awk : search last index in specific column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk : search last index in specific column
# 15  
Old 11-13-2013
All,

Thanks for your inputs.
The suggested solution works for me
Code:
awk -F\| -v X="an" 'match ($NF, ".*"X) {print $0, RLENGTH-length(X)+1}' OFS=\| file

The project involves searching a list of keywords in a master input file.
1. master_file.txt containing strings. Record Count : 20000000 String Records.
2. keywords.txt containing keywords. Record Count : 200,000 Unique Keywords.

I run a shell script which reads a keyword and runs the above mentioned command to search in master_file.txt to append desired output.

The end result is being achieved as per our expectations.
However, i am concerned about the performance and response time of this utility.
I tried with 16000 keywords and 20000 master records and the process took around 25 minutes.

I am looking to reduce this number and considered the following:
1. Split up file into n part and run searches in parallel and then collate results?
2. Possible tweaking for commands ?
3. Is text mining in shell correct from a design and feasibility perspective ?

Please provide your inputs.
# 16  
Old 11-13-2013
You shouldn't run one command (= one process) per keyword, that's definitely too ineffective. On the other hand, the numbers you indicated might be too high for grep or awk. There's programs/applications/databases out there designed for text mining - I'm sure they'd be more appropriate for your task.
# 17  
Old 11-18-2013
Thanks Rudi.

Running in parallel is certainly one option and i am exploring that.
Certainly agree there are specific programs designed for it; but i wanted to invest some time to find out something at the ground level.
# 18  
Old 11-18-2013
I find shell script plus awk/sed/grep a great way to prototype a concept, but it's worth knowing the limitations of the tools you are using. I'd suggest sticking with a smaller subset and finalizing your prototype and in the background start researching text mining tools/relational databases etc.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk to change a specific column and in a specific row

I am trying to change the number in bold to 2400 01,000300032,193631306,190619,0640,1,80,,2/ 02,193631306,000300032,1,190618,0640,CAD,2/ I'm not sure if sed or awk is the answer. I was going to use sed and do a character count up to that point, but that column directly before 0640 might... (8 Replies)
Discussion started by: juggernautjoee
8 Replies

2. What is on Your Mind?

Updated Forum Search Index Min Word Length to 2 Chars and Added Quick Search Bar

Today I changed the forum mysql database to permit 2 letter searches: ft_min_word_len=2 I rebuilt the mysql search indexes as well. Then, I added a "quick search bar" at the top of each page. I have tested this and two letter searches are working; but it's not perfect,... (1 Reply)
Discussion started by: Neo
1 Replies

3. Shell Programming and Scripting

Overwrite specific column in xml file with the specific column from adjacent line

I have an xml file dumped from rrd file, that I want to "patch" so the xml file doesn't contain any blank hole in the resulting graph of the rrd file. Here is the file. <!-- 2015-10-12 14:00:00 WIB / 1444633200 --> <row><v> 4.0419731265e+07 </v><v> 4.5045912770e+06... (2 Replies)
Discussion started by: rk4k
2 Replies

4. Shell Programming and Scripting

Search Replace Specific Column using RegEx

Have Pipe Delimited File: > BRYAN BAKER|4/4/2015|518 VIRGINIA AVE|TEST > JOE BAXTER|3/30/2015|2233 MockingBird RD|ROW2On 3rd column where the address is located, I want to add a space after every numeric value - basically doing a "s//&\ / ": > BRYAN BAKER|4/4/2015|5 1 8 VIRGINIA AVE|TEST > JOE... (5 Replies)
Discussion started by: svn
5 Replies

5. Shell Programming and Scripting

awk Search Array Element Return Index

Can you search AWK array elements and return each index value for that element. For example an array named car would have index make and element engine. I want to return all makes with engine size 1.6. Array woulld look like this: BMW 1.6 BMW 2.0 BMW 2.5 AUDI 1.8 AUDI 1.6 ... (11 Replies)
Discussion started by: u20sr
11 Replies

6. Shell Programming and Scripting

awk to search for specific line and replace nth column

I need to be able to search for a string in the first column and if that string exists than replace the nth column with "-9.99". AW12000012012 2.38 1.51 3.01 1.66 0.90 0.91 1.22 0.82 0.57 1.67 2.31 3.63 0.00 AW12000012013 1.52 0.90 1.20 1.34 1.21 0.67 ... (14 Replies)
Discussion started by: ncwxpanther
14 Replies

7. UNIX for Dummies Questions & Answers

Average by specific column value, awk

Hi, I am searching for an awk-script that computes the mean values for the $2 column, but addicted to the values in the $1 column. It also should delete the unnecessary lines after computing... An example (for some reason I cant use the code tag button): cat list.txt 1 10 1 30 1 20... (2 Replies)
Discussion started by: bjoern456
2 Replies

8. Shell Programming and Scripting

awk uniq and longest string of a column as index

I met a challenge to filter ~70 millions of sequence rows and I want using awk with conditions: 1) longest string of each pattern in column 2, ignore any sub-string, as the index; 2) all the unique patterns after 1); 3) print the whole row; input: 1 ABCDEFGHI longest_sequence1 2 ABCDEFGH... (12 Replies)
Discussion started by: yifangt
12 Replies

9. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

10. Shell Programming and Scripting

Insert a text from a specific row into a specific column using SED or AWK

Hi, I am having trouble converting a text file. I have been working for this whole day now, still i couldn't make it. Here is how the text file looks: _______________________________________________________ DEVICE STATUS INFORMATION FOR LOCATION 1: OPER STATES: Disabled E:Enabled ... (5 Replies)
Discussion started by: Issemael
5 Replies
Login or Register to Ask a Question