Help tabulating file putting repeated strings as headers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help tabulating file putting repeated strings as headers
# 1  
Old 04-11-2018
Help tabulating file putting repeated strings as headers

Hi. May somebody help me with this.

I´m trying to tabulate the following input file, but the desired output I´m getting is incorrect.

I have access to GNU/LINUX (Ubuntu) and Cygwin

Input file
Code:
STAGE = 1
ID = 0
NAME = JFMSC
TYPE = MLRR
DFRUL = PERMISSION
ADDR = 1001
RRUL = PERMISSION
SPRR = TRUE
ISGALW = FALSE
ISUTWD = FALSE

STAGE = 1
ID = 2
NAME = PLLSJS
TYPE = MLRR
DFRUL = PERMISSION

STAGE = 1
ID = 4
NAME = AAAARR
TYPE = MLRR
DFRUL = RESTRICT
ADDR = 3553
RRUL = PERMISSION
SPRR = FALSE
ISGALW = FALSE
ISUTWD = FALSE
ADDR = 66444
RRUL = PERMISSION
SPRR = FALSE
ISGALW = FALSE
ISUTWD = FALSE
ADDR = 890087
RRUL = PERMISSION
SPRR = FALSE
ISGALW = FALSE
ISUTWD = FALSE

STAGE = 1
ID = 0
NAME = PPROOA
TYPE = RRHN
DFRUL = PERMISSION
ADDR = 7034
RRUL = PERMISSION
SPRR = FALSE
ISGALW = FALSE
ISUTWD = FALSE

This is the code I was able to construct so far

Code:
awk 'BEGIN{print "STAGE|ID|NAME|TYPE|DFRUL|ADDR|RRUL|SPRR|ISGALW|ISUTWD"}
/ID/{a=$3}
/NAME/{b=$3}
/TYPE/{c=$3}
/DFRUL/{d=$3}
/ADDR/{f=$3}
/RRUL/{g=$3}
/SPRR/{h=$3}
/ISGALW/{i=$3}
/ISUTWD/{j=$3
  print a"|"b"|"c"|"d"|"f"|"g"|"h"|"i"|"j
}
' file.txt

My current output
Code:
STAGE|ID|NAME|TYPE|DFRUL|ADDR|RRUL|SPRR|ISGALW|ISUTWD
0|JFMSC|MLRR|PERMISSION|1001|PERMISSION|TRUE|FALSE|FALSE
4|AAAARR|MLRR|RESTRICT|3553|PERMISSION|FALSE|FALSE|FALSE
4|AAAARR|MLRR|RESTRICT|66444|PERMISSION|FALSE|FALSE|FALSE
4|AAAARR|MLRR|RESTRICT|890087|PERMISSION|FALSE|FALSE|FALSE
0|PPROOA|RRHN|PERMISSION|7034|PERMISSION|FALSE|FALSE|FALSE

Desired output
Code:
STAGE|ID|NAME|TYPE|DFRUL|ADDR|RRUL|SPRR|ISGALW|ISUTWD
1|0|JFMSC|MLRR|PERMISSION|1001|PERMISSION|TRUE|FALSE|FALSE
1|2|PLLSJS|MLRR|PERMISSION|||||
1|4|AAAARR|MLRR|RESTRICT|3553|PERMISSION|FALSE|FALSE|FALSE
|||||66444|PERMISSION|FALSE|FALSE|TRUE
|||||890087|PERMISSION|FALSE|TRUE|FALSE
1|0|PPROOA|RRHN|PERMISSION|7034|PERMISSION|FALSE|FALSE|FALSE

Thanks in advance.

Moderator's Comments:
Mod Comment Please use CODE (not PHP) tags for data (output) as well as required by forum rules!

Last edited by RudiC; 04-12-2018 at 03:39 AM.. Reason: Changed PHP to CODE tags.
# 2  
Old 04-12-2018
Extending your attempt:
Code:
awk '
BEGIN           {print "STAGE|ID|NAME|TYPE|DFRUL|ADDR|RRUL|SPRR|ISGALW|ISUTWD"
                }
/STAGE/         {x=$3}
/ID/            {a=$3}
/NAME/          {b=$3}
/TYPE/          {c=$3}
/DFRUL/         {d=$3}
/ADDR/          {f=$3}
/RRUL/          {g=$3}
/SPRR/          {h=$3}
/ISGALW/        {i=$3}
/ISUTWD/        {j=$3
                 print x"|"a"|"b"|"c"|"d"|"f"|"g"|"h"|"i"|"j
                 a = b = c = d = e = f = g = h = i = j = x = ""
                }
' file
STAGE|ID|NAME|TYPE|DFRUL|ADDR|RRUL|SPRR|ISGALW|ISUTWD
1|0|JFMSC|MLRR|PERMISSION|1001|PERMISSION|TRUE|FALSE|FALSE
1|4|AAAARR|MLRR|RESTRICT|3553|PERMISSION|FALSE|FALSE|FALSE
|||||66444|PERMISSION|FALSE|FALSE|FALSE
|||||890087|PERMISSION|FALSE|FALSE|FALSE
1|0|PPROOA|RRHN|PERMISSION|7034|PERMISSION|FALSE|FALSE|FALSE

# 3  
Old 04-12-2018
Thanks so much RudiC.

Clear the variables was the key!

In this case there are 11 headers, but if there are N headers for which I want to get the value in $3, how would be a way to make a shortest script and avoid write this up to N?

headerNth = /StringNth/ {Nth=$3}

A kind of loop?
# 4  
Old 04-12-2018
This
Code:
awk -F"[ =]+"  '
BEGIN           {HD = "STAGE|ID|NAME|TYPE|DFRUL|ADDR|RRUL|SPRR|ISGALW|ISUTWD"
                 for (MX=n=split (HD, HDArr, "|"); n>0; n--) SRCH[HDArr[n]]
                 print HD
                }

!NF             {for (i=1; i<=MX; i++) printf "%s%s", RES[HDArr[i]], (i == MX)?ORS:OFS
                 delete RES
                }

$1 in SRCH      {RES[$1] = $2
                }

END             {for (i=1; i<=MX; i++) printf "%s%s", RES[HDArr[i]], (i == MX)?ORS:OFS
                }

' OFS="|" file
STAGE|ID|NAME|TYPE|DFRUL|ADDR|RRUL|SPRR|ISGALW|ISUTWD
1|0|JFMSC|MLRR|PERMISSION|1001|PERMISSION|TRUE|FALSE|FALSE
1|2|PLLSJS|MLRR|PERMISSION|||||
1|4|AAAARR|MLRR|RESTRICT|890087|PERMISSION|FALSE|FALSE|FALSE
1|0|PPROOA|RRHN|PERMISSION|7034|PERMISSION|FALSE|FALSE|FALSE

will extract data based on the header fields and is thus very flexible. Unfortunately you didn't define the record terminator which would define the point at which to print a line. And, the records are non-uniformly structured: you have "incomplete" records (rec 2) and records with duplicate field entries (rec 3). You need to define very clearly and accurately what to extract and what to print, and adapt the script accordingly.
This User Gave Thanks to RudiC For This Post:
# 5  
Old 04-12-2018
Code:
awk '
BEGIN {lines=0; column_count=0}
$2 !~ /=/ || NF != 3 {next}
! column[$1]++ {columns[column_count++]=$1}
$1 ~ /^STAGE*$/ {lines++}
{column_data[$1, lines]=$3}
END {
   for (i=0; i<column_count; i++) if (columns[i]) printf columns[i] ((i<column_count-1) ? "|" : "\n")
   for (i=1; i <= lines; i++) {
      for (j=0; j < column_count; j++) {
         if (columns[j]) printf column_data[columns[j], i] ((j<column_count-1) ? "|" : "\n")
      }
   }
}
' infile


Last edited by rdrtx1; 04-12-2018 at 08:00 PM.. Reason: updated for record initiator (what happens to repeat values within a record?)
This User Gave Thanks to rdrtx1 For This Post:
# 6  
Old 04-12-2018
Quote:
Originally Posted by RudiC
Unfortunately you didn't define the record terminator which would define the point at which to print a line. And, the records are non-uniformly structured: you have "incomplete" records (rec 2) and records with duplicate field entries (rec 3). You need to define very clearly and accurately what to extract and what to print, and adapt the script accordingly.
I'm surprised when an awk script is written in that way, I almost don't understan the logic to customize it.

The record initiator will be always the line STAGE = 1
The Record terminator will be the empty line before the next Record Initiator. The issue is that in the original file the records with duplicate fields (rec 3) has empty lines before the occurence of next Record Initiator. And between Record Terminator and next Record Initiator (in green) could be more than one empty line and some garbaje lines (in red).

At the end of file appears and END string.

This would be a better representation of input file:
Code:
some garbaje
some garbaje
some garbaje

STAGE = 1
ID = 0
NAME = JFMSC
TYPE = MLRR
DFRUL = PERMISSION
ADDR = 1001
RRUL = PERMISSION
SPRR = TRUE
ISGALW = FALSE
ISUTWD = FALSE

some garbaje
some garbaje

STAGE = 1
ID = 2
NAME = PLLSJS
TYPE = MLRR
DFRUL = PERMISSION

some garbaje
some garbaje

STAGE = 1
ID = 4
NAME = AAAARR
TYPE = MLRR
DFRUL = RESTRICT
ADDR = 3553
RRUL = PERMISSION
SPRR = FALSE
ISGALW = FALSE
ISUTWD = FALSE

ADDR = 66444
RRUL = PERMISSION
SPRR = FALSE
ISGALW = FALSE
ISUTWD = FALSE

ADDR = 890087
RRUL = PERMISSION
SPRR = FALSE
ISGALW = FALSE
ISUTWD = FALSE

some garbaje
some garbaje

STAGE = 1
ID = 0
NAME = PPROOA
TYPE = RRHN
DFRUL = PERMISSION
ADDR = 7034
RRUL = PERMISSION
SPRR = FALSE
ISGALW = FALSE
ISUTWD = FALSE

---    END


Last edited by Ophiuchus; 04-14-2018 at 11:59 PM..
# 7  
Old 04-14-2018
Quote:
Originally Posted by rdrtx1
Code:
what happens to repeat values within a record?

The duplicated fields within a record should be printed in different lines in the output. In the output, the records that don't have fields repeated only have one line.

See below output (lines in green are the values for record 3) and input of my post #6.

Code:
STAGE|ID|NAME|TYPE|DFRUL|ADDR|RRUL|SPRR|ISGALW|ISUTWD
1|0|JFMSC|MLRR|PERMISSION|1001|PERMISSION|TRUE|FALSE|FALSE
1|4|AAAARR|MLRR|RESTRICT|3553|PERMISSION|FALSE|FALSE|FALSE
|||||66444|PERMISSION|FALSE|FALSE|FALSE
|||||890087|PERMISSION|FALSE|FALSE|FALSE
1|0|PPROOA|RRHN|PERMISSION|7034|PERMISSION|FALSE|FALSE|FALSE

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Putting strings into positioning array in loop

i need to add 2 string variables into a positioning array , repeatedly - in loop. First string in $2, second to $3 then up to the desired count incrementing the "position". Using set -- alone does not increment the count so I end up with 2 variables in the array. How do I increment the... (7 Replies)
Discussion started by: annacreek
7 Replies

2. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies

3. UNIX for Dummies Questions & Answers

Joining ends of strings in certain order with repeated ID's

I posted this a few days ago and got some help (Putting together substrings if pattern is matched - Page 2 | Unix Linux Forums | Shell Programming and Scripting) But I am now stuck on an issue that is similar but not the same really. I want to join parts of one line with parts of another line... (8 Replies)
Discussion started by: verse123
8 Replies

4. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

5. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

6. Shell Programming and Scripting

delete repeated strings (tags) in a line and concatenate corresponding words

Hello friends! Each line of my input file has this format: word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the... (2 Replies)
Discussion started by: mjomba
2 Replies

7. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies

8. Shell Programming and Scripting

Deleting repeated strings in column 2

Hi to all, I have a file where the subject could contain "Summarized Availability Report" or only "Summarized Report" If the subject is "Summarized Availability Report" I want to apply it Scrip1 and if the subject is "Summarized Report" I want to apply it Scrip2. 1-) I would like you... (5 Replies)
Discussion started by: cgkmal
5 Replies

9. Shell Programming and Scripting

Remove text between headers while leaving headers intact

Hi, I'm trying to strip all lines between two headers in a file: ### BEGIN ### Text to remove, contains all kinds of characters ... Antispyware-Downloadserver.com (Germany)=http://www.antispyware-downloadserver.c om/updates/ Antispyware-Downloadserver.com #2... (3 Replies)
Discussion started by: Trones
3 Replies

10. UNIX for Dummies Questions & Answers

putting a timestamp in a file

I was sure there was a way to put a timestamp ina logfile but I can't seem to figure out how. What I would like to do is after the last messages in the rptmgr.err log is put a timestamp so I know the next time I look whats new. I am using AIX 5.1 any help will great Thanks (2 Replies)
Discussion started by: rocker40
2 Replies
Login or Register to Ask a Question