file editting with shell programmin


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting file editting with shell programmin
# 1  
Old 07-10-2005
file editting with shell programmin

Hello,

I have several handreds of text files. The format of file looks like:

column1 column2 column3 column4 column5
id1 definition1 name1 fieldid comm1
id2 definition2 name2 fieldid comm2
id3 definition3 name3 fieldid comm3
id4 definition4 name2 fieldid comm2
id5 definition5 name1 fieldid comm1
id6 definition6 name4 fieldid comm4
...
...

What I need to do is to extract all of the column3 out, remove the duplicates and transpose column3 to form a new tab dilimited file, each row should inherit the original file name. The new file should look like:

file1 name1 name2 name3 ....
file2 name-a name-b name-c ....
file3 name11 name21 name-x ...
...
...

I am new to shell scripting, awk, sed and etc. Could anyone help me out with this probem?

Thanks in advance.

Steve
# 2  
Old 07-10-2005
Code:
for file in input*; do
   awk '{val[$3]=1} END{ printf FILENAME"\t"; for (i in val) {printf i"\t"} print}' $file >> newfile
done

# 3  
Old 07-11-2005
Thanks so much. It works well if original file doesn't have space in the contents. But how to specify a tab or comma delimited file as input?

Thanks again.
# 4  
Old 07-11-2005
I modified code slightly by putting FS="\t" ahead of val and it worked! As I executed code I got two more problems. First, the row header in eash file should be omitted. Second, there are some records like "---" which shouldn't be included. Can these be solved? Thanks.
# 5  
Old 07-11-2005
Quote:
Originally Posted by ssshen@mit.edu
I modified code slightly by putting FS="\t" ahead of val and it worked! As I executed code I got two more problems. First, the row header in eash file should be omitted. Second, there are some records like "---" which shouldn't be included. Can these be solved? Thanks.
or all in awk:

nawk -f shen.awk input*

shen.awk:
Code:
BEGIN {
  OFS="\t"
}
FNR > 1 && $3 != "-" {
   arr[FILENAME , $3]
}
END {
  for(i in arr) {
    split(i, tmp, SUBSEP)
    printf("%s", tmp[1] )
    for(cell in arr) {
      split(cell, ctmp, SUBSEP)
      if ( ctmp[1] == tmp[1] ) {
        printf("%s%s", OFS,ctmp[2])
        delete arr[cell]
      }
    }
    printf "\n"
  }
}

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with Shell Scrip in Masking particular columns in .csv file or .txt file using shell script

Hello Unix Shell Script Experts, I have a script that would mask the columns in .csv file or .txt file. First the script will untar the .zip files from Archive folder and processes into work folder and finally pushes the masked .csv files into Feed folder. Two parameters are passed ... (5 Replies)
Discussion started by: Mahesh G
5 Replies

2. Shell Programming and Scripting

Editting a record

robert (6 Replies)
Discussion started by: robert89
6 Replies

3. Linux

Problem editting the first occurence of a pattern in the first uncommented line

Hi I have to replace a pattern found in the first uncommented line in a file. The challenge I'm facing is there are several such similar lines but I have to edit only the first uncommented line. Eg: #this is example #/root/xyz:Old_Pattern /root/xyz:Old_Pattern /root/xyz:Old_Pattern ... (10 Replies)
Discussion started by: Stoner008
10 Replies

4. Programming

gdbm with c programmin Books & Tutorial.

Hello, Could you please tell me if there are any books/tutorial on gdbm programming with C for beginners. So far, I've found few tutorials and one of those: http://www.network-theory.co.uk/docs/gccintro/gccintro_22.html is easy to understand but very short in treatment as it dealt with ... (6 Replies)
Discussion started by: vectrum
6 Replies

5. Shell Programming and Scripting

Editting each line in a file

Hi Can anyone please help me in resolving my issue. I have a file with entries like this t9787ms 99970 22/08/2010 12:30 /www.google.com t9788ms 99942 22/08/2010 12:40 /www.google.com t4788ms 88942 22/08/2010 01:40 /www.google.com there are around 5 lakh records of this type my requirement... (4 Replies)
Discussion started by: mskalyani
4 Replies

6. UNIX for Dummies Questions & Answers

matching IDs from two files and editting

I have two files. One has: ID# 0 a b c d e f g h i j k....................~2 milion columns ID# 0 l m n o p q r s t u v....................~2 milion columns . . . ~6000 lines Other has: ID# 1 or ID# 2 . . ~6000 lines (2 Replies)
Discussion started by: polly_falconer
2 Replies

7. Shell Programming and Scripting

editting file

Hi, I am having sequence of process ids in one file. My file contents is (Output of fuser someobject.so). 654 14583 17890 25902 This no. of processes may vary depends up on the object. I want to check all the processes one by one. If i want to apply egrep, I need to... (3 Replies)
Discussion started by: sharif
3 Replies

8. Shell Programming and Scripting

Unix file editting commands

ok, are there any other file editting commands out there other than the below that comes with sunsolaris & linux vi, emacs, ed, (1 Reply)
Discussion started by: Terrible
1 Replies

9. IP Networking

socket programmin

I was trying to write proxy code but i here is a problem typedef struct req_msg { char *host;//hostname char *filename;//filename char *modified;//date char *data; char *request; }req_msg; while(take_responce(req,request)!=0) // take_responce gets responce from http server //... (0 Replies)
Discussion started by: yogesh_powar
0 Replies

10. Shell Programming and Scripting

Books on Shell programmin

Can Anyone suggest few good books on Unix Shell Programming and C on Unix. Thanks Prasad (5 Replies)
Discussion started by: pswar70
5 Replies
Login or Register to Ask a Question