Adding variables to repeating strings | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Adding variables to repeating strings

UNIX for Dummies Questions & Answers


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 01-31-2013
verse123 verse123 is offline
Registered User
 
Join Date: Oct 2011
Last Activity: 1 June 2014, 10:33 PM EDT
Posts: 83
Thanks: 20
Thanked 1 Time in 1 Post
Adding variables to repeating strings

Hello,

I want to add a letter to the end of a string if it repeats in a column.

so if I have a file like this:


Code:
DOG001
DOG0023
DOG004
DOG001
DOG0023
DOG001

the output should look like this:


Code:
DOG001-a
DOG0023-a
DOG004
DOG001-b
DOG0023-b
DOG001-c


how can I do this? thanks in advance

Last edited by Scrutinizer; 01-31-2013 at 06:47 PM.. Reason: quote -> code tags
Sponsored Links
    #2  
Old 01-31-2013
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 24 July 2014, 9:57 AM EDT
Location: Варна, България / Milano, Italia
Posts: 5,663
Thanks: 182
Thanked 616 Times in 574 Posts

Code:
awk 'END {
  for (i = 0; ++i <= NR;) {
    c[substr(r[i], length(r[i]) - 1)] > 1 && r[i] = r[i] OFS l[s[i]] 
    print r[i]
    }
  }
{  
  r[NR] = $0
  s[NR] = ++c[substr($0, length($0) - 1)]  
  }
BEGIN {  
  split("a b c d e f g h i j k l m n o p q r s t u v w x y z", l)
  }' infile

With some awk implementations, NR may not be available in the END block,
let me know if you're using one of these.
You may need to decide what to do if your pattern exceeds the letters in the alphabet
Sponsored Links
    #3  
Old 01-31-2013
verse123 verse123 is offline
Registered User
 
Join Date: Oct 2011
Last Activity: 1 June 2014, 10:33 PM EDT
Posts: 83
Thanks: 20
Thanked 1 Time in 1 Post
this is the error message I am receiving


Code:
-bash: echo: write error: Broken pipe
awk: f1.awk:1: awk 'END {
awk: f1.awk:1:     ^ invalid char ''' in expression

    #4  
Old 01-31-2013
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 24 July 2014, 9:57 AM EDT
Location: Варна, България / Milano, Italia
Posts: 5,663
Thanks: 182
Thanked 616 Times in 574 Posts
Wow,
try the following:

1. Create a script file with the following content:

Code:
END {
  for (i = 0; ++i <= NR;) {
    c[substr(r[i], length(r[i]) - 1)] > 1 && r[i] = r[i] OFS l[s[i]] 
    print r[i]
    }
  }
{  
  r[NR] = $0
  s[NR] = ++c[substr($0, length($0) - 1)]  
  }
BEGIN {  
  split("a b c d e f g h i j k l m n o p q r s t u v w x y z", l)
  }

2. Run the following command:

Code:
awk -f script_name input_file

Edit: OK, I realized that you put the entire command in the script file.
Please remove awk ' and ' infile !
Sponsored Links
    #5  
Old 01-31-2013
verse123 verse123 is offline
Registered User
 
Join Date: Oct 2011
Last Activity: 1 June 2014, 10:33 PM EDT
Posts: 83
Thanks: 20
Thanked 1 Time in 1 Post
sorry im new to this,

so this file has 21,092 lines and around line 5427 this script begins to skip some repeats and doesn't assign a letter. Towards the very bottom of the file there are hardly any repeats being assigned letters. Is there a size limitation to this?

---------- Post updated at 05:49 PM ---------- Previous update was at 05:39 PM ----------

I also noticed that in some cases it jumps letters like in the sample below


Code:
DOG000160 a
DOG000160 b
DOG000161 e
DOG000161 f
DOG000162 b

it's calling DOG000161 "e" instead of "a". And DOG000162 "b" is really supposed to be "a". Why do you suppose this is?

Last edited by Scrutinizer; 01-31-2013 at 06:48 PM.. Reason: quote tags to code tags
Sponsored Links
    #6  
Old 01-31-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 24 July 2014, 1:29 PM EDT
Location: San Jose, CA, USA
Posts: 4,128
Thanks: 161
Thanked 1,411 Times in 1,196 Posts
Quote:
Originally Posted by verse123 View Post
sorry im new to this,

so this file has 21,092 lines and around line 5427 this script begins to skip some repeats and doesn't assign a letter. Towards the very bottom of the file there are hardly any repeats being assigned letters. Is there a size limitation to this?

---------- Post updated at 05:49 PM ---------- Previous update was at 05:39 PM ----------

I also noticed that in some cases it jumps letters like in the sample below



it's calling DOG000161 "e" instead of "a". And DOG000162 "b" is really supposed to be "a". Why do you suppose this is?
Are there more than 26 occurrences of a single input value?

If so, what "letter" do you want to assign when an input value appears more than 26 times?

Do all of the values that aren't assigned trailing letters appear more than once in your input?

Do any of the values that aren't assigned trailing letters appear more than once but less than 27 times?
Sponsored Links
    #7  
Old 02-01-2013
verse123 verse123 is offline
Registered User
 
Join Date: Oct 2011
Last Activity: 1 June 2014, 10:33 PM EDT
Posts: 83
Thanks: 20
Thanked 1 Time in 1 Post
there are not more than 9 occurrences of a single input value.

Not all of the values that aren't assigned trailing letters appear more than once. some appear only once some appear several times, but never more than 9 times.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Need help with repeating variables in a shell script ricco19 UNIX for Dummies Questions & Answers 4 01-24-2013 03:04 PM
AWK adding prefix/suffix to list of strings chrisjorg Shell Programming and Scripting 5 04-18-2012 07:17 AM
Adding strings to lines in a file sharath160 Shell Programming and Scripting 3 09-25-2009 10:35 AM
Adding Strings to a file Blackbox Shell Programming and Scripting 3 09-22-2009 11:11 PM
Repeating variables in the code mahalakshmi Shell Programming and Scripting 1 02-08-2007 06:33 AM



All times are GMT -4. The time now is 01:44 PM.