Extract columns from a file if the name dont exist put blank


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract columns from a file if the name dont exist put blank
# 8  
Old 10-05-2010
Quote:
Originally Posted by nightrider
...
What if I don't know beforehand the columns that were missing? Will your code still work? ...
No, it won't. The posted code -
(a) adds an "ID" column in front of first column
(b) adds an "Email" column right before the 3rd column

Since it performs simple substitution and does not check for column headers specifically, you'd have two "ID" columns if your first column actually happens to be an "ID" column !

Quote:
...
I was using for loop in my earlier posting to perform a column header name matching to try to find out 1) the missing column ( as this is not fixed)
2) the position of the column if it matches...
The fact that you do not know what all columns could be missing is most significant and it makes your problem much more difficult.

Here's what I think -
(a) Firstly, you will have to know your "desired" column headers. You won't know what's missing unless you have a complete list for comparison.

(b) Secondly, there's a question about "gaps" as well as "order". Let's say your desired column header is "c1,c2,c3,c4,c5" and your file has "c1,c3,c5". The missing columns over here are "c2,c4", but your file still has columns in order.

Now, how do you want to handle the case when your file has columns "c4,c3,c1" ? There are missing columns, but the existing columns are out of order as well. Is this possible at all ? Do you want to change that to your "desired" order - i.e. fill the gaps as well as reorder ?

tyler_durden

Last edited by durden_tyler; 10-05-2010 at 05:38 PM..
# 9  
Old 10-05-2010
Yes Tyler. There will be a desired coLumn header. And the script will have to fix gaps as well as order.

I don't know how to do this in a single unix script, is this even possible?

If not then I am thinking to use a macro tool to process the header, compare with the desired column header and order, then generate an awk command that will print the columns, like this:
1) macro read the column header and position
2) macro compare this with the desired column
3) macro generate awk print statement, ordering the position returned in (1) and where there is a gap, macro will generate print "\t"
4) pass the generated awk statement to unix to execute

Possible to do this entirely in unix?
# 10  
Old 10-05-2010
Yes it's possible.

Code:
$ 
$ cat rearrange.awk
BEGIN {
  IFS = "\t"; OFS = "\t"                                               # set the field separators
  num = split("ID,NAME1,NAME2,EMAIL,ADDRESS,TEL",main,",")             # set the "main" array
}
{
  if (NR==1) {                                                         # if we are at the first line
    for (i=1; i<=NF; i++) {                                            # then loop through the headers
      for (j=1; j<=num; j++) {                                         # and search for them in main array
        if ($i == main[j]) {map[i] = j; break}                         # if match found then
      }                                                                # set "map" array and break
    }
  for (j=1; j<=num; j++) printf j==1 ? main[j] : OFS main[j]           # print headers in correct order
  printf "\n"
  } else {                                                             # for line number > 1
    for (i=1; i<=NF; i++) template[map[i]] = $i                        # loop through fields; set "template"
    for (j=1; j<=num; j++) printf j==1 ? template[j] : OFS template[j] # and simply print template
    printf "\n"
  }
}
$ 
$ cat abc.txt
NAME1    NAME2    ADDRESS    TEL
A    B    ADD1    1234
X    Y    ADD2    5678
$ 
$ 
$ awk -f rearrange.awk abc.txt
ID    NAME1    NAME2    EMAIL    ADDRESS    TEL
    A    B        ADD1    1234
    X    Y        ADD2    5678
$ 
$ 
$ cat abc1.txt
NAME1    ADDRESS
A    ADD1
X    ADD2
$ 
$ awk -f rearrange.awk abc1.txt
ID    NAME1    NAME2    EMAIL    ADDRESS    TEL
    A            ADD1    
    X            ADD2    
$ 
$ 
$ cat abc2.txt
ADDRESS    NAME2    TEL    NAME1
ADD1    B    1234    A
ADD2    Y    5678    X
$ 
$ 
$ awk -f rearrange.awk abc2.txt
ID    NAME1    NAME2    EMAIL    ADDRESS    TEL
    A    B        ADD1    1234
    X    Y        ADD2    5678
$ 
$ 

Sorry, my output data isn't aligned because the TABs are lost when I copy from my terminal emulator and paste it over here. And I am too lazy to add the TABs manually.

tyler_durden

Last edited by durden_tyler; 10-05-2010 at 11:40 PM..
# 11  
Old 10-06-2010
Yoz Tyler, no problem with the TAB. Cant thank you enuf for the code already, how can i fuss over the formatting. Smilie

Anyway, I am executing your code, it seems like my system not able to parse these 2 lines correctly. Any idea?

printf j==1 ? main[j] : OFS main[j]
printf j==1 ? template[j] : OFS template[j]

Cheers

---------- Post updated at 09:55 AM ---------- Previous update was at 12:24 AM ----------

Hi Tyler,

I managed to work around this error by using if-else statement:

if (j==1)
printf main[j]
else
printf OFS main[j]

So the statement executed and printed OK now!
However, if abc.txt contains record with spaces, then it won't print correctly.

For example:

Field "USA SpringField" will print as "USA". I have no background in Unix but I think it is to be related to the printf statement.

Is it something i need to "tune" the printf to get it correctly? Let me google it in the meantime.

Thanks
# 12  
Old 10-06-2010
Replace:
IFS="\t"
with
FS="\t"
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract data from a log file and put it in a file

Hi, I would like to seek your help for a script that will extract data from log file and put it in a file. Sample log file 2018-10-23 12:33:21 AI ERROR -- tpid: SAMPLE_TH account: 123456789 aiSessionNumber: 660640464 mapName: xxx to yyy errorDesc: Translation Error:ErrorNumber : 993 ... (2 Replies)
Discussion started by: neverwinter112
2 Replies

2. Shell Programming and Scripting

Script to find blank records in a file except for few columns

I have a file with the following format: X|High|2|GIC|DM||XHM|||6 Months X|Moderate|2|GIC|DM||XHM|||6 Months X|High|2|GCM|DM||XSF|||6 Months X|Med|2|GCM|DM||XSF|||6 Here there are ten columns but I need to print rows having blank records in any of the rows (except for 6th,8th and 9th... (10 Replies)
Discussion started by: chatwithsaurav
10 Replies

3. Shell Programming and Scripting

Remove blank columns from a tab delimited text file

Hello, I have some tab delimited files that may contain blank columns. I would like to delete the blank columns if they exist. There is no clear pattern for when a blank occurs. I was thinking of using sed to replace instances of double tab with blank, sed 's/\t\t//g' All of the examples... (2 Replies)
Discussion started by: LMHmedchem
2 Replies

4. Shell Programming and Scripting

Inserting blank columns in already present CSV file

Hi, i have a csv file which have headers and values of it like below : headers --> CI Ref SerialNumber LastScanDate values --> VMware-42,VMware-42,Tue, 20 May 2014 11:03:44 +0000 i want to have a above csv in below format : headers --> CI Name CI Description CI Ref... (6 Replies)
Discussion started by: omkar.jadhav
6 Replies

5. Shell Programming and Scripting

How to sort a text file if certain columns are blank?

Dear all, I am trying to sort a text file based on column 3, 10, 11 and 12. But certain column are blank for some lines. Column 3 has to be in ascending order after sorting. Part of my input file is as follows: CN727990 1 A01 4703 5083 73.28 - A_scaffold000011 4365605 4365985 73.28 +... (10 Replies)
Discussion started by: huiyee1
10 Replies

6. UNIX for Dummies Questions & Answers

Rsync copy files if dont exist

I have a setup where I have two drives. TV TVbackup For what ever reason, I have a lot of content on my TVbackup drive which isn't on my TV drive. I want to copy all the files across which are on TVbackup but are not currently on TV. If there is a file with the same name but a... (2 Replies)
Discussion started by: Spadez
2 Replies

7. Shell Programming and Scripting

ls | grep (i dont know what to put here)

Dear users, I googled for a while, but i have got a lot of different answers regarding a simple unix command. lets say there are a lot of files in a directory. How can i list the files in a directory whose file types is "text"? Thank you in advance (4 Replies)
Discussion started by: kevincobain2000
4 Replies

8. Shell Programming and Scripting

Removing blank columns from a file

Hi, I have a sample file as shown below -- id parent name dba -----------------------------------... (7 Replies)
Discussion started by: sumirmehta
7 Replies

9. Shell Programming and Scripting

How to extract visually blank lines form the file

Hi, Could some one help me to get rid of visually blank lines from a file using shell or awk or sed (on Solaris machine)? When I use grep grep -v ^$ inputfile >outputfile it removes some blank lines.. but it seems some tab plus space balnk lines remains. thaen I used "grep -v '^]*$' ... (1 Reply)
Discussion started by: hadsuresh
1 Replies

10. UNIX for Advanced & Expert Users

Tab Completion showing files that Dont Exist

Has anyone seen this before? if i ls a particular directory (an email Maildir) i get a long list of files, if i 'du -h' that same directory, i get messages like: du: `./1053894791.17207_0.srweb,S=6199': No such file or directory strange. so using tab will complete this filename too, giving... (3 Replies)
Discussion started by: dittonamed
3 Replies
Login or Register to Ask a Question