Text processing in UNIX


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Text processing in UNIX
# 1  
Old 06-04-2018
Text processing in UNIX

Greetings!

I have a text file that I am trying to process to get the desired output but looks like I will need the community help.

Input File:


Code:
a|x|london|consumer|consumer1|country||D|consumer|consumer1|country||1
a|x|paris|consumer|consumer2|country||D|consumer1|consumer2|country||2
a|x|paris|consumer3|consumer2|country||D|consumer1|consumer2|country||2
a|x|spain|id|sys|id|country|U|consumer|sys|id|country|3
b|x|spain|hash|sys|id|country|U|consumer|sys|id|country|3
b|x|spain|tkn|sys|id|country|U|consumer|sys|id|country|3
b|x|spain|txt|sys|id|country|U|consumer|sys|id|country|3


I need to read a file similar to the above example and get the following output. the output needs to be broken down by the number we see at the end of each line. Same numbered lines needs to be grouped.

Output:

Code:
a|x|london|consumer=consumer|consumer1=consumer1 and country=country|1
a|x|paris|consumer=consumer1, consumer3=consumer1|consumer2 = consumer2 and country=country|2
a|x|spain|id=consumer, hash=consumer,tkn=consumer,txt=consumer|sys=sys and id=id and country=country|3

In other terms.

for each number grouping in the file we do this

Code:
$1|$2|$3|$4=$9[,$4=$9 if we have more than one line]|$5=$10[and $6=$11 and $7=$12 if they have some value in their fields]| $13

Here is what I tried..


Code:
 key=$(head -1 filename.out | awk -F'|' '{ print $1"|"$2"|"$3 }')
        value1=""
        value2=""

        while read p; do
          anon1=$( echo $p | awk -F'|' '{ print $4"="$9 }')
            if [ "$value1" != "" ]; then value1="${value1}, ${anon1}"; else value1="${anon1}"; fi
          anon2=$( echo $p | awk -F'|' '{ print $5"="$10" and "$6"="$11" and "$7"="$12 }')
        done < filename.out

        if [ "$value2" != "" ]; then value2="${value2} and ${anon2}"; else value2="${anon2}"; fi
        echo $key"|"$value1"|"$value2


the above code works but i want to loop this through the rest of the file where we have lines broken down by numbered values like the input shown above in the example. if i run the above code to the above input it combines all the lines which is not the desired output. it should only combine the lines that share the same numbered value at the end of each line.

Thank you very much.
# 2  
Old 06-20-2018
You must delay the printout until the last field changes, and then print the stored values. At the end of the file you must do it once more, therefore a function is appropriate.
Code:
prtout(){
  [[ -n $lastf13 ]] && echo "$key|$out1|$out2|$lastf13"
}

out1=""
out2=""
lastf13=""
while IFS="|" read f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14
do
  if [[ $f13 != $lastf13 ]]
  then
    prtout
    key="$f1|$f2|$f3"
    out1="$f4=$f9"
    out2="$f5=$f10"
    [[ -n $f6 ]] && out2="$out2 and $f6=$f11"
    [[ -n $f7 ]] && out2="$out2 and $f7=$f12"
    lastf13=$f13
  else
    out1="$out1,$f4=$f9"
  fi
done < filename.out
prtout

# 3  
Old 06-21-2018
The shell has the advantage that you can easily boost it with external commands.
If this is not intended, it is ideal for awk (using its loop over lines with automatic field splitting)
Code:
# shell code with embeded awk 'script'
awk '
  BEGIN { FS=OFS="|" }
# the FS is the field delimiter for the input
# the OSF is for the print command
  function prtout(){
    if (lastf13!="") print key,out1,out2,lastf13
  }
# main loop
  {
    if ($13!=lastf13) {
      prtout()
      key=($1 FS $2 FS $3)
      out1=($4 "=" $9)
      out2=($5 "=" $10)
      if ($6!="") out2=(out2 " and " $6 "=" $11)
      if ($7!="") out2=(out2 " and " $7 "=" $12)
      lastf13=$13
    } else {
      out1=(out1 "," $4 "=" $9)
    }
  }
  END { prtout() }
' < filename.out

# 4  
Old 06-21-2018
If your input file is not sorted on the last field, awk can build up arrays for the fields and output at the end:
Code:
awk -F'|' '{
    if ($13 in k)
        d1[$13] = d1[$13] "," $4 "=" $9
    else {
        d1[$13] = $4 "=" $9
        k[$13] = $1 FS $2 FS $3
        d2[$13] = $5 "=" $10
        if ($6!="") d2[$13] = d2[$13] " and " $6 "=" $11
        if ($7!="") d2[$13] = d2[$13] " and " $7 "=" $12
    }
}
END {
   for(id in k)
      print k[id] FS d1[id] FS d2[id] FS id
}' filename.out

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk for text processing

Hi,my file is in this format ", \"symbol\": \"Rbm38\" } ]" I want to convert it to a more user readable format _id pubmed text symbol 67196 18667844 Overexpression of UBE2T in NIH3T3 cells significantly promoted colony formation in mouse cell cultures Ube2t 56190 21764855 ... (3 Replies)
Discussion started by: biofreek
3 Replies

2. Shell Programming and Scripting

Text processing

Hi, Need an advise on $ cat test.txt START field1 field2 field3 field4 field5 field6 END 12345|6|1|2|3|4|111|119 67890|6|1|3|8|9|112|000 $ (4 Replies)
Discussion started by: getmilo
4 Replies

3. Shell Programming and Scripting

Help with text processing

I have an Input file which has a series of lines(which could vary) followed by two blank lines and then another series of lines(Could be any number of lines) followed by two blank lines and then repeats. I need to use filters to convert the following input file(which is an example) to an output... (7 Replies)
Discussion started by: bikerboy
7 Replies

4. Shell Programming and Scripting

Text processing using awk

I dispose of two tab-delimited files (the first column is the primary key): File 1 (there are multiple rows sharing the same key, I cannot merge them) A 28,29,30,31 A 17,18,19 B 11,13,14,15 B 8,9File 2 (there is one only row beginning with a given key) A 2,8,18,30,31 B ... (3 Replies)
Discussion started by: dovah
3 Replies

5. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

6. Shell Programming and Scripting

Text processing doubt

How to print nth column of a pattern/file without using awk,cut commands? (1 Reply)
Discussion started by: rajkumarin
1 Replies

7. Shell Programming and Scripting

Text processing of file

I have a text file which is a dataset. and I need to convert it into a CSV format The file is as follows : First line : -1 3:1 11:1 14:1 19:1 39:1 42:1 55:1 64:1 67:1 73:1 75:1 76:1 80:1 83:1 Second line " +1 5:1 11:1 15:1 32:1 39:1 40:1 52:1 63:1 67:1 73:1 74:1 76:1 78:1 83:1 There are a... (6 Replies)
Discussion started by: ajayram
6 Replies

8. Shell Programming and Scripting

seeking help in text processing

Hi, I am a newbie in shell scripting. I want to get an expert help in solving a text processing issue. The issue I am facing is that, in the below log file contents I need to extract each block of lines (it could be a single line also) based on some regular expression and store it in... (8 Replies)
Discussion started by: Alecs
8 Replies

9. UNIX for Dummies Questions & Answers

text file processing

Hello! There is a text file, that contains hierarchy of menues, like: Aaaaa->Bbbbb Aaaaa->Cccc Aaaaa-> {spaces} Ddddd (it means that the full path is Aaaaa->Cccc->Ddddd ) Aaaaa-> {more spaces} Eeeee (it means that the full path is Aaaaa->Cccc->Ddddd->Eeeee ) Fffffff->Ggggg... (1 Reply)
Discussion started by: alias47
1 Replies

10. UNIX for Dummies Questions & Answers

Processing a text file

A file contains one name per line, such as: john doe jack bruce nancy smith sam riley When I 'cat' the file, the white space is treated as a new line. For example list=`(cat /path/to/file.txt)` for items in $list do echo $items done I get: john doe (1 Reply)
Discussion started by: TheCrunge
1 Replies
Login or Register to Ask a Question