Parsing a column and extracting subsets


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing a column and extracting subsets
# 1  
Old 07-23-2013
Parsing a column and extracting subsets

Please help with this.. my file sizes exceed 40GB,,not possible to do manually.


I have a string in the 2nd column that has strings like 5M108N31M, 3S2M100N45M4S etc..the first column is a number.


There can be 0,1 or 2 number of S but only 1,2 Ms and only 1 N.
S only occurs at the beginning or at the end of the string.

I want to extract values corresponding to M,S,N

example for string 5M108N31M

S1=0 (since no S is present)
M1=5
N=108
M2=31

for the second string 3S2M100N45M4S

S1=3
M1=2
N=100
M2=45
S2=4

In the output the first column is (1st column in input+S1+M1)
In the output the second column is (1st column in input+S1+M1 + N )
M2 and S2 are not needed for the calculations.

Sample input
Code:
100 5M108N31M
100 3S2M100N45M4S

Sample output
Code:
105 213
105 205

# 2  
Old 07-23-2013
try:
Code:
awk '{s=m=n=0}
/S/ {s=$0; sub("S.*","",s); sub(".*[A-Z ]","",s)}
/M/ {m=$0; sub("M.*","",m); sub(".*[A-Z ]","",m)}
/N/ {n=$0; sub("N.*","",n); sub(".*[A-Z ]","",n)}
{print $1+s+m, $1+s+m+n}
' input

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 07-23-2013
something along these lines:
awk -f ritak.awk mySampleFile

where ritak.awk is:
Code:
{
  s=m=n=0

  match($2, "[0-9][0-9]*M")
  m=substr($2, RSTART, RLENGTH-1)

  match($2, "[0-9][0-9]*S")
  s=substr($2, RSTART, RLENGTH-1)

  match($2, "[0-9][0-9]*N")
  n=substr($2, RSTART, RLENGTH-1)

  print $1+s+m, $1+s+m+n
}

This User Gave Thanks to vgersh99 For This Post:
# 4  
Old 07-23-2013
Try:
Code:
awk -F'[^0-9]*' '
  { 
    split($0,L,/[0-9]*/)
    for(i=NF-1; i>1; i--) A[L[i+1]]=$i
    n=$1 + A["S"]+A["M"]
    print n, n+A["N"]
    A["S"]=x
  }
' file

This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 07-23-2013
excellent. thanks a lot !!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with File processing - Extracting the column

I have a line from table space report: 5 135_TT ms Normal 1774336.0 1774208.0 761152.0 1013056.0 57.1% Now I have to get 1013056.0 as o/p. For this I tried cut -f32 -d" " previously it worked now it is showing empty space. Suggest me the best code for this which... (1 Reply)
Discussion started by: karumudi7
1 Replies

2. Shell Programming and Scripting

Extracting rows with a certain column

Hi, I want to extract rows that have specific characters at a certain column. It might be best to show you my problem. So my tab delimited file looks like this: YPR161C 10 16 864445 866418 - Verified 3.558 YOL138C 6 15 61325 65350 - Verified 0.6... (1 Reply)
Discussion started by: phil_heath
1 Replies

3. Shell Programming and Scripting

Selecting lowest and highest values in columns 1 and 2, based on subsets in column 3

Hi, I have a file with the following columns: 361459 447394 CHL1 290282 290282 CHL1 361459 447394 CHL1 361459 447394 CHL1 178352861 178363529 AGA 178352861 178363529 AGA 178363657 178363657 AGA Essentially, using CHL1 as an example. For any line that has CHL1 in... (2 Replies)
Discussion started by: hubleo
2 Replies

4. UNIX for Dummies Questions & Answers

Extracting the last column of a text file

I would like to extract the last column of a text file but different rows of the text file have different numbers of columns. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

5. Shell Programming and Scripting

parsing df column values

Hi all, I need to run df, and parse the value under column of "Mounted on" For instance, my df is Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 4881344 4106460 526924 89% / none 245164 220 244944 1% /dev... (6 Replies)
Discussion started by: peuceul
6 Replies

6. Shell Programming and Scripting

Extracting column value from perl

Hello Kindly help me to find out the first column from first line of a flat file in perl I/P 9869912|20110830|00000000000013009|130|09|10/15/2010 12:36:22|W860944|N|00 9869912|20110830|00000000000013013|130|13|10/15/2010 12:36:22|W860944|N|00... (5 Replies)
Discussion started by: Pratik4891
5 Replies

7. UNIX for Advanced & Expert Users

extracting/copy a column into a new column

Hello, Anybody out there knows how to copy a column data into a blank column using unix command? Thanks (1 Reply)
Discussion started by: folashandy
1 Replies

8. Shell Programming and Scripting

Extracting a column using AWK

Hi, I've a text file like ABC,,100 A,100,200 In the above example, I have 3 columns. I want to extract the second column. I'm expecting a value like 100 i.e first record will not have any value but still it has to give me null value. second record should give 100. Can anybody... (2 Replies)
Discussion started by: ronald_brayan
2 Replies

9. Shell Programming and Scripting

Extracting one column from a ps -ef command

Hi, I want to extract one value/column from a ps -ef command. Here's an example of the output: mqm 14552 1 0 15:48:43 - 0:00 amqpcsea SWNETTQ1 mqm 57082 1 0 15:48:42 - 0:00 amqpcsea SWNETDQ1 mqm 88104 1 0 15:26:37 - 0:00 amqpcsea SWNETEQ1... (6 Replies)
Discussion started by: m223464
6 Replies

10. Shell Programming and Scripting

Parsing file and extracting the useful data block

Greetings All!! I have a very peculiar problem where I have to parse a big text file and extract useful data out of it with starting and ending block pattern matching. e.g. I have a input file like this: sample data block1 sample data start useful data end sample data block2 sample... (5 Replies)
Discussion started by: arminder
5 Replies
Login or Register to Ask a Question