Single Liner for indexing


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Single Liner for indexing
# 1  
Old 12-04-2014
Single Liner for indexing

Hello,

This is pretty simple, I`m looking for a faster and better method than brute force that I`m doing.

I have a 20GB file looks like

Code:
Name1,Var1,Val1
Name1,Var2,Val2
Name2,Var1,Val3
Name2,Var2,Val4


I want 3 files.

Nameindex

1 Name1
2 Name2
...

Varindex

1 Var1
2 Var2
...

datafile

1 1 Value1
1 2 Value2
2 1 Value3
2 2 Value4

What I`m doing right now is

Code:
cut -f1 infile | sort -u | cat -n > nameindex

cut -f2 infile | sort -u | cat -n > varindex

Then sort the datafile by each column and join
# 2  
Old 12-04-2014
awk -f sen.awk my20Gfile
where sen.awk is:
Code:
BEGIN {
  FS=","
  fileNames="senNames.txt"
  fileVars="senVars.txt"
  fileData="senData.txt"
}
{
  if(!($1 in names)) { names[$1];print ++namesC, $1 >> fileNames}
  if(!($2 in vars))  { vars[$2];print ++varsC, $2 >> fileVars}
  print namesC, varsC, $3 >> fileData
}

the resulting files will be: senNames.txt, senVars.txt and senData.txt
The naming of the files could be easily adjusted in the script.
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 12-04-2014
I think the counters should be assigned to the array items as values. For example:

Code:
awk -F, '
  !($1 in A) { 
    print A[$1]=++c, $1>"name.idx"
  } 

  !($2 in B) {
    print B[$2]=++d, $2>"var.idx"
  }
  
  {
    print A[$1], B[$2], $3>"file.dat"
  }
' file


---
It can be put on a single line, but it is a bit long that way:
Code:
awk -F, '!($1 in A){print A[$1]=++c, $1>"name.idx"} !($2 in B){print B[$2]=++d, $2>"var.idx"} {print A[$1], B[$2], $3>"file.dat"}' file

This User Gave Thanks to Scrutinizer For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. What is on Your Mind?

Your site has been switched to Mobile First Indexing

Well, Google throws the web a curve ball again: I thought I was going to get a break from coding; but no..... https://www.unix.com/members/1-albums215-picture1240.png (15 Replies)
Discussion started by: Neo
15 Replies

2. Shell Programming and Scripting

Indexing Variable Names

Hi All I think I might have bitten off more than I can chew here and I'm hoping some of you guys with advanced pattern matching skills can help me. What I want to do is index the occurrence of variable names within a library of scripts that I have. Don't ask why, I'm just sad like that... ... (3 Replies)
Discussion started by: bbq
3 Replies

3. Shell Programming and Scripting

Split a file using 2-D indexing system

I have a file and want to split it using a 2-D index system for example if the file is p.dat with 6 data sets separated by ">". I want to set nx=3, ny=2. I need to create files p.dat.1.1 p.dat.1.2 p.dat.1.3 p.dat.2.1 p.dat.2.2 p.dat.2.3 I have tried using a single index and want... (3 Replies)
Discussion started by: kristinu
3 Replies

4. UNIX for Dummies Questions & Answers

awk, array indexing

cat filename|nawk ' { FS="="; if (!a++ == 0) print $0 } ' can anyone plz explain how does array inexing works,how it is evaluating if (!a++ == 0)?? (2 Replies)
Discussion started by: dreamzalive
2 Replies

5. Shell Programming and Scripting

indexing a file

hello guys, I have a file like this: input.dat Push-to-talk No Coonection IP support Support for IP telephony Yes Built-in SIP stack Yes Support via software Yes Microsoft Support for Microsoft Exchange Yes UMA (5 Replies)
Discussion started by: Johanni
5 Replies

6. Shell Programming and Scripting

indexing list of words in a file

Hey all, I'm doing a project currently and want to index words in a webpage. So there would be a file with webpage content and a file with list of words, I want an output file with true and false that would show which word exists in the webpage. example: Webpage content data.html ... (2 Replies)
Discussion started by: Johanni
2 Replies

7. Shell Programming and Scripting

Search & Replace regex Perl one liner to AWK one liner

Thanks for giving your time and effort to answer questions and helping newbies like me understand awk. I have a huge file, millions of lines, so perl takes quite a bit of time, I'd like to convert these perl one liners to awk. Basically I'd like all lines with ISA sandwiched between... (9 Replies)
Discussion started by: verge
9 Replies

8. Shell Programming and Scripting

[ask]filtering file to indexing...

dear all, i have file with format like this file_master.txt 20110212|231213|rio|apri|23112|222222 20110212|312311|jaka|dino|31223|543234 20110301|343322|alfan|budi|32131|333311 ... i want filter with output like this index_nm.txt rio|apri jaka|dino ... index_years.txt 20110212... (7 Replies)
Discussion started by: zvtral
7 Replies

9. Shell Programming and Scripting

Array indexing in shell

Hi , I have 4 array as below Input: servernames=(10.144.0.129 10.144.0.130 10.144.0.131) subfolder_129=(PSTN_SigtranCamel_03 PSTN_SigtranCamel_04 PSTN_SigtranCamel_05) subfolder_130=(SigtranCamel_11 SigtranCamel_12 SigtranCamel_13 SigtranCamel_14 SigtranCamel_15)... (4 Replies)
Discussion started by: sushmab82
4 Replies
Login or Register to Ask a Question