Adding variables to repeating strings


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Adding variables to repeating strings
# 8  
Old 02-01-2013
Please post a bigger sample of the real data file.
# 9  
Old 02-01-2013
This is a larger sample:

Code:
 DOG000832
 DOG000832
 DOG000833
 DOG000833
 DOG000834
 DOG000835
 DOG000835
 DOG000837
 DOG000839
 DOG000839
 DOG000840
 DOG000840
 DOG000841
 DOG000841
 DOG000842
 DOG000843
 DOG000844
 DOG000844
 DOG000847
 DOG000848
 DOG000848
 DOG000849
 DOG000850
 DOG000850
 DOG000851
 DOG000852
 DOG000852
 DOG000853
 DOG000853
 DOG000854
 DOG000854
 DOG000855
 DOG000855
 DOG000856
 DOG000857
 DOG000858
 DOG000858
 DOG000859
 DOG000859
 DOG000860
 DOG000860
 DOG000861
 DOG000862
 DOG000862
 DOG000863
 DOG000864
 DOG000865
 DOG000865
 DOG000866


Last edited by Scrutinizer; 02-01-2013 at 02:06 PM..
# 10  
Old 02-01-2013
OK,
there was an error in my code, try this one:

Code:
awk 'END {
  for (i = 0; ++i <= NR;) {
    c[substr(r[i], length(r[i]))] > 1 && r[i] = r[i] OFS l[s[i]] 
    print r[i]
    }
  }
{  
  r[NR] = $0
  s[NR] = ++c[substr($0, length($0))]  
  }
BEGIN {  
  split("a b c d e f g h i j k l m n o p q r s t u v w x y z", l)
  }' infile

If you want to use it in a script:

Code:
END {
  for (i = 0; ++i <= NR;) {
    c[substr(r[i], length(r[i]))] > 1 && r[i] = r[i] OFS l[s[i]] 
    print r[i]
    }
  }
{  
  r[NR] = $0
  s[NR] = ++c[substr($0, length($0))]  
  }
BEGIN {  
  split("a b c d e f g h i j k l m n o p q r s t u v w x y z", l)
  }

# 11  
Old 02-01-2013
It seems that the 26 letters are not enough. I tried to include capital letters as well but that wasn't enough either, even though the largest repeat is only 9 times.
# 12  
Old 02-01-2013
Could you please post an example? A sample that demonstrates the behavior described above.
# 13  
Old 02-01-2013
Hi Verse123,
I guess I can't figure out the logic behind radoulov's script. It seems to me that the array c is counting the number of times the last character in an input line appears instead of the number of times the whole input line appears. (And you asked for a "-" between the input and the repeat indicator where radoulov's script used the ouput field separator instead.)

Try using this as the awk command file:
Code:
BEGIN { suffices="-a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"}
{       o[NR] = $1
        s[NR] = substr(suffices, ++c[$1] * 2 - 1, 2)
        if(c[$1] > 26) s[NR] = "-*"
}
END {   for(i = 1; i <= NR; i++)
                printf("%s%s\n", o[i], c[o[i]] > 1 ? s[i] : "")
}

Note that I used $1 instead of $0 because you talked about doing this for a column. With the input you gave us, $0 and $1 are the same. If you later use input that has multiple input fields, this script will only modify entries found in the 1st field.
This User Gave Thanks to Don Cragun For This Post:
# 14  
Old 02-01-2013
Hi Don,

Quote:
I guess I can't figure out the logic behind radoulov's script.
sure, sorry for posting obfuscated/non-commented code.

Quote:
It seems to me that the array c is counting the number of times the last character in an input line appears instead of the number of times the whole input line appears.
Correct, I really don't know why I was thinking that verse123 wanted to track down the last character in the strings only ...

Quote:
And you asked for a "-" between the input and the repeat indicator where radoulov's script used the ouput field separator instead.
And that's another detail that I missed.

Just for the record, I'm posting a corrected version of my code
(if I'm not missing something again, of course Smilie):

Code:
awk 'END {
  for (i = 0; ++i <= NR;) {
    c[r[i]] > 1 && r[i] = r[i] "-" l[s[i]] 
    print r[i]
    }
  }
{  
  r[NR] = $0
  s[NR] = ++c[$0]  
  }
BEGIN {  
  split("a b c d e f g h i j k l m n o p q r s t u v w x y z", l)
  }' infile

Thanks for bearing with me and sorry for the noise!

Commented version:
Code:
awk 'END {
# loop over the records (r) array
  for (i = 0; ++i <= NR;) {
  # if the record appears more than once
  # add the suffix using the letters (l) array   
    c[r[i]] > 1 && r[i] = r[i] "-" l[s[i]] 
    print r[i]
    }
  }
{  
# store all records in an array
# named r (for records), keyed by NR
  r[NR] = $0
# associate the number of occurrences 
# of each record in an array named s (for suffix) 
# using an array named c (for count)  
  s[NR] = ++c[$0]  
  }
BEGIN {  
# prepare the array l (for letters)
# containing the alphabet  
  split("a b c d e f g h i j k l m n o p q r s t u v w x y z", l)
  }' infile


Last edited by radoulov; 02-01-2013 at 05:09 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Adding sequential index to duplicate strings

I have a text file in the following format >Homo sapiens KQKCLYNLPFKRNLEGCRERCSLVIQIPRCCKGYFGRDCQACPGGPDAPCNNRGVCLDQY SATGECKCNTGFNGTACEMCWPGRFGPDCLPCGCSDHGQCDDGITGSGQCLCETGWTGPS CDTQAVLPAVCTPPCSAHATCKENNTCECNLDYEGDGITCTVVDFCKQDNGGCAKVARCS... (2 Replies)
Discussion started by: jerrild
2 Replies

2. Shell Programming and Scripting

--Parsing out strings for repeating delimiters for everyline

Hello: I have some text output, on SunOS 5.11 platform using KSH: I am trying to parse out each string within the () for each line. I tried, as example: perl -lanF"" -e 'print "$F $F $F $F $F $F"' But for some reason, the output gets all garbled after the the first fields.... (8 Replies)
Discussion started by: gilgamesh
8 Replies

3. Shell Programming and Scripting

Script to rename the repeating strings

All, I have a sample text like below. Key (Header) Key1 ABC Key2 ABC Key3 ABC ABC Key4 ABC Key5 ABC ABC ABC Required Output Key (Header) Key1 (2 Replies)
Discussion started by: ks_reddy
2 Replies

4. UNIX for Dummies Questions & Answers

Need help with repeating variables in a shell script

I should preface this by saying I have never worked with shell scripts before so this is all new to me. I was able to make something that worked, but is terribly optimized, and I have no idea how to improve it. If anything it's a pretty hilarious script: #/bin/bash get_char() { ... (4 Replies)
Discussion started by: ricco19
4 Replies

5. Shell Programming and Scripting

AWK adding prefix/suffix to list of strings

75 103 131 133 138 183 197 221 232 234 248 256 286 342 368 389 463 499 524 538 (5 Replies)
Discussion started by: chrisjorg
5 Replies

6. Shell Programming and Scripting

Adding Variables

Hi. I have a for loop that I use to extract integer values in a shell script (ksh). Now, I would like to add the values. My preference, from my c programming days, would be to do something like the commented out line below in the for loop. However, this is not recognised. So I use the line... (2 Replies)
Discussion started by: mikem22
2 Replies

7. Shell Programming and Scripting

Adding strings to lines in a file

Hi all, I have a positional text file that comes from some source application. Before it is processed by destination application I have to add some header (suffix) to every record(line) in the file. e.g. Actual File ............... AccountDetails AcNO Name Amount 1234 John 26578 5678... (3 Replies)
Discussion started by: sharath160
3 Replies

8. Shell Programming and Scripting

Adding Strings to a file

Well thanks a lot but I have another Problem I try to solve. I habe one simple Textfile with entries like this, for example: file1 file2 file3 file4 ... file200 And I want to add Strings at the beginning on the line. Like this word1 file1 word1 file2 ... I hope you can help me (3 Replies)
Discussion started by: Blackbox
3 Replies

9. Shell Programming and Scripting

bash hell , removing " and adding from a strings

I'm writing a bash script and i'm stuck the out put of a dialog menu is echo $select "foo" "bar" "lemon" cheese" while I need $foo $bar $lemon $cheese to reuse them as strings later in the script and very new to bash scripting and i've no idea how to do this any help would be... (2 Replies)
Discussion started by: xpd259
2 Replies

10. Shell Programming and Scripting

Repeating variables in the code

Hi all, I had written 3 KSH scripts for different functionalities. In all these 3 files there are some 30 variables in common. So I want to reduce the code by placing these variables in a common properties file named (dataload.prop/dataload.parms/dataload.txt) or txt file and access it... (1 Reply)
Discussion started by: mahalakshmi
1 Replies
Login or Register to Ask a Question