Splitting a column in two separate fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting a column in two separate fields
# 1  
Old 04-24-2014
Splitting a column in two separate fields

for making a summary
I have a CSV file which is transformed to .DAT. I have an AWK file which is supposing to do the mapping of the DAT file. The code from the AWK file is the one below.

The content of the DAT file looks like this (tab separated):

Code:
 ODT AGE CDT CO SEX TIME VALUE COMMENT
    P3 Y6-8 ACT FG F 2011 1297
    P4 Y3-4 EMP FG M 2011 6940 bd
    P1 Y7-9 GRT FG F 2011 0 c

What I have to do is the following:

1. COMMENT columns must be splitted in two different fields: STRING_COM and STRING_STATUS - done
2. VALUE column should be renamed in "NUMB" - done
3. keep the headers together with the columns order - done

**NOT DONE**

Code:
  4.  if the VALUE is ":" then NUMB is null
        if the VALUE is ":" and COMMENT "c" then NUMB is null and STRING_COM is "c"
        if the VALUE is ":" and COMMENT "u" then NUMB is null and STRING_STATUS is "u"
        if the VALUE is "14,385" and COMMENT "d" then NUMB is "14385" and STRING(both) is null
        if the VALUE is "14,385" and COMMENT "du" then NUMB is "14385" and STRING_STATUS is "u"
        if the VALUE is ":" and COMMENT "cd" then NUMB is null and STRING_COM is "c"
        if the VALUE is ":" and COMMENT "bc" then NUMB is null and STRING_COM is "c" and STRING_STATUS is "b"
        if the VALUE is ":" and COMMENT "z" then NUMB is 0 and STRING_STATUS is "z"

4. No matter what other columns will be received except the ones mentioned here they must be removed - done

**awk code:**

Code:
 BEGIN {
      FS=","; OFS="\t";
      a["ODT"]=1;a["AGE"]=1;a["CDT"]=1;a["CO"]=1;
        a["SEX"]=1;a["TIME"]=1;a["VALUE"]=1;a["COMMENT"]=1;
    }
    NR==1 {
        $NF=substr($NF,1,length($NF)-1);
        for(i=1;i<=NF;i++) if($i in a) a[$i]=i;
    }
    {   print $a["ODT"],$a["AGE"],$a["CDT"],$a["CO"],$a["SEX"],$a["TIME"],NR==1?"NUMB":$a["VALUE"],
        NR==1?"STRING_COM"OFS"STRING_STATUS":($a["COMMENT"]?""OFS$a["COMMENT"]:$a["COMMENT"]);
    }

Does anyone knows how can I solve point 4?

**expected result should be**

**csv input**

Code:
ODT AGE CDT CO  SEX TIME    NUMB   COMMENT
    P3    Y6-8    AWT    EE    F    2011    1297    
    P4    Y3-4    ESP    RR    M    2011    6940    cd
    P1    Y7-9    UDK    FF    F    2011    :    du
    PL    Y3-9    EUP    SS    F    2011    :       d
    P9    Y_5    ACT    DD    F    2011    :    cd
    P6    Y5-9    UAK    DF    M    2011    :    z

OUTPUT
Code:
  
ODT AGE CDT CO  SEX TIME    NUMB   STRING_COM       STRING_STATUS
    P3    Y6-8    AWT    EE    F    2011    1297    
    P4    Y3-4    ESP    RR    M    2011    6940    c
    P1    Y7-9    UDK    FF    F    2011           u
    PL    Y3-9    EUP    SS    F    2011        
    P9    Y_5    ACT    DD    F    2011        c
    P6    Y5-9    UAK    DF    M    2011    0       z

Thank you in advance
# 2  
Old 05-01-2014
The specification of what you're trying to do is confusing.

Your awk script is using comma as the input field separator, but your sample input file is using spaces (not commas) as field separators. Furthermore, part of your rules in step 4:
Code:
        if the VALUE is "14,385" and COMMENT "d" then NUMB is "14385" and STRING(both) is null
        if the VALUE is "14,385" and COMMENT "du" then NUMB is "14385" and STRING_STATUS is "u"

contains commas in the data in a field. Is that field supposed to be quoted in your input.csv file? If not, you can't have commas that are both data and field separators.

You say that you want tab as the field separator in your output file, but the sample output you have shown us has leading spaces on many of the lines and one or four spaces as field separators. And with headings STRING_COM and STRING_STATUS (both of which are more than eight characters wide), the data won't line up under your output headers. (Could you use COM and STATUS as output heading instead of STRING_COM and STRING_STATUS?

There are several combinations of VALUE and COMMENT listed in rule 4 that are not included in your sample data.

There is nothing in rule 4 that says what should happen if VALUE is neither a colon nor 14,385. Add a case (or cases) to rule 4 to cover all expected input combinations.

Please update your request to make the sample data you provide (both input.csv and output) match your specifications, update rule 4 to cover all possible input, and add sample input and output that tests all of the conditions specified in your updated rule 4.

When your specification of a problem is incomplete and your sample input and output don't match the description of your input and desired output, it is highly likely that you won't get any response to your request for help. When your specifications are clear and your sample data shows the transformations you're trying to perform, you are much more likely to get a quick response.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

File splitting according to the length of the fields

Hi All, I have two files: 1> Data file 2> info file which has field lengths and start position. Is there a way to create a comma delimited file according to the fields length and start position. Data file : R-0000017611N-00000350001095ANZU01 A00000017611N000000350001095ANZU02... (11 Replies)
Discussion started by: nua7
11 Replies

2. Shell Programming and Scripting

Separate fields

awk 'NF==2{s=$1;next}{$(NF+1)=s}1' sort.txt > output.txt A_16_P32713632 chr10 90695750 90695810 ACTA2 A_16_P32713635 chr10 90696573 90696633 ACTA2 A_16_P32713680 chr10 90697419 90697479 ACTA2 The command above outputs the data as a string separated by a space in 1 field. I can not... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Shell Programming and Scripting

Help in splitting Sub Fields and compare with other field

Hi All, We are trying to pull out data from below table, the table contains four fields and out of which last two fields are having sub-fields with delimiter $, we want to identify number "1" position in the 3rd field and from 4th field need to extract the information from the same position. ... (4 Replies)
Discussion started by: rramkrishnas
4 Replies

4. Shell Programming and Scripting

Help with splitting fields

Hi. I want to put the first field to the end and the lines are of different number of fields. How should I do this with awk? Thanks. (3 Replies)
Discussion started by: dustinwang2003
3 Replies

5. Shell Programming and Scripting

Print every 5 4th column values as separate row with different first column

Hi, I have the following file, chr1 100 200 20 chr1 201 300 22 chr1 220 345 23 chr1 230 456 33.5 chr1 243 567 90 chr1 345 600 20 chr1 430 619 21.78 chr1 870 910 112.3 chr1 914 920 12 chr1 930 999 13 My output would be peak1 20 22 23 33.5 90 peak2 20 21.78 112.3 12 13 Here the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

6. Shell Programming and Scripting

separate the file according to the number of fields

I have a file which is delimetered by ',' i need to filter out a file with respect to the number of fileds in each line. a,s,d,f,g,h,j,k,l 1,2,3,3,4,5,6,7,6 a,2,3 4,5,6,7 in this i neeed to filter out the lines with 8 column to another file and rest to another file. so ... (3 Replies)
Discussion started by: ratheeshjulk
3 Replies

7. Shell Programming and Scripting

Splitting text file into 2 separate files ??

Hi All, I am new to this forumn as well to the UNIX, I have basic knowledge of UNIX which I studied some years ago, now I have to do some shell scripting to load data into Oracle database using sqlldr utility, whcih I am able to do. I have a requirement where I need to do following operation. I... (10 Replies)
Discussion started by: shekharjchandra
10 Replies

8. Shell Programming and Scripting

Separate fields

Hi everyone! I have a field like that: I need to keep I don't know how to use the Capital character like a separator and how to keep only this one... I guess sed could do something like that... Thanks;) (3 Replies)
Discussion started by: Castelior
3 Replies

9. Shell Programming and Scripting

Need help in splitting a line into fields in shell scripting

I have a line of more than 3000 bytes which will contain & as fields separator..I am using following awk command ..Its working but its not accepting the line more than 3000 bytes...Anyother alternate solution even in othe shell command also fine... awk -F '&' '{for( i=1; i<=NF; i++ ) print $i}'... (2 Replies)
Discussion started by: punithavel
2 Replies

10. UNIX for Dummies Questions & Answers

Separate special character fields

Hi, I have a file with User ID and User Name. Sometimes the file has speical characters in the USer ID and it creates problems. I want to allow all those fields to be processed which have only numbers and characters. I do NOT want to process those fields with Speical Characters. How... (1 Reply)
Discussion started by: baanprog
1 Replies
Login or Register to Ask a Question