Text manipulation help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Text manipulation help
# 1  
Old 06-22-2016
Text manipulation help

Hello again,

I have a problem manipulating a large text document and there is no way I could edit this document by hand.

Format is:

Code:
Address : XXXX N 37 Ave, Hollywood, FL, 33021
Phone: XXX3190XXX
Player: XXXXXX
Character: Jaramillo 
DOB[mm-dd-yyyy]: June-14-1995
-----
Name: Alexandra
Ticket #: XXXXXXXXXXXXXXXXX
Creation Date: 03-2015
Ref: 299XXXXXX
====================
IP: 73.XXXXXXXX - 73.XXXXXXXX
Submited on: Wednesday 22nd of June 2016 04:31:28 PM



Address : XXXX XXXXXX SE, Washington (city), DC (state), 20032 (zipcode)
Phone: XXX5954XXX
Player: XXXXXX
Character: Hernandez
DOB[mm-dd-yyyy]: April-24-1986
-----
Name: GXX(1) JonXX(2)
Ticket #: XXXXXXXXXXXXXXXXX
Creation Date: 03-2016
Ref: 449XXXXXX
====================
IP: 66.44.XXXX - XXXXX.md.cable.rcn.com
Submited on: Monday 20th of June 2016 05:50:03 PM

And so on...

I want the whole list to be like this (only the values from the specific fields):

Code:
Ticket #|Creation Month|Creation Year|Ref|Name|Address|City|State|Zipcode|Phone|Player|DOB|IP

Is it possible. Please note... all fields are different in size.

Thank you.

Last edited by Don Cragun; 06-22-2016 at 10:05 PM.. Reason: Add CODE tags for sample output header.
# 2  
Old 06-22-2016
Please show us the output you are hoping to get from the sample data you provided. The formats shown for some of the fields do not match the heading preceding them and I can't tell if you are trying to copy data given or perform data translation functions as well as text manipulations.

And, please post you output in CODE tags too; not just your input.

What code have you come up with while trying to do this on your own?
# 3  
Old 06-23-2016
Quote:
Originally Posted by galford
Hello again,

I have a problem manipulating a large text document and there is no way I could edit this document by hand.

Format is:

Code:
Address : XXXX N 37 Ave, Hollywood, FL, 33021
Phone: XXX3190XXX
Player: XXXXXX
Character: Jaramillo 
DOB[mm-dd-yyyy]: June-14-1995
-----
Name: Alexandra
Ticket #: XXXXXXXXXXXXXXXXX
Creation Date: 03-2015
Ref: 299XXXXXX
====================
IP: 73.XXXXXXXX - 73.XXXXXXXX
Submited on: Wednesday 22nd of June 2016 04:31:28 PM



Address : XXXX XXXXXX SE, Washington (city), DC (state), 20032 (zipcode)
Phone: XXX5954XXX
Player: XXXXXX
Character: Hernandez
DOB[mm-dd-yyyy]: April-24-1986
-----
Name: GXX(1) JonXX(2)
Ticket #: XXXXXXXXXXXXXXXXX
Creation Date: 03-2016
Ref: 449XXXXXX
====================
IP: 66.44.XXXX - XXXXX.md.cable.rcn.com
Submited on: Monday 20th of June 2016 05:50:03 PM

And so on...

I want the whole list to be like this (only the values from the specific fields):

Code:
Ticket #|Creation Month|Creation Year|Ref|Name|Address|City|State|Zipcode|Phone|Player|DOB|IP

Is it possible. Please note... all fields are different in size.

Thank you.
Save as example2pipe.pl
Run as perl example2pipe.pl galford.file

Code:
#!/usr/bin/perl
use strict;
use warnings;

my %info;
while (<>) {
    if (/^Address/../^Submited/) {
        $info{$1} = $2 if /^(\w+)[^:]*:\s?(.*)$/;
    }
    elsif (%info){
        push my @output, $info{'Ticket'}, (split /-/, $info{'Creation'}),
                @{info{'Ref','Name'}}, (split /,\s?/, $info{'Address'}),
                @{info{'Phone','Player','DOB','IP'}};
        print join ("|", @output), "\n";
        %info = ();
    }
}

Output example:
Code:
XXXXXXXXXXXXXXXXX|03|2015|299XXXXXX|Alexandra|XXXX N 37 Ave|Hollywood|FL|33021|XXX3190XXX|XXXXXX|June-14-1995|73.XXXXXXXX - 73.XXXXXXXX
XXXXXXXXXXXXXXXXX|03|2016|449XXXXXX|GXX(1) JonXX(2)|XXXX XXXXXX SE|Washington (city)|DC (state)|20032 (zipcode)|XXX5954XXX|XXXXXX|April-24-1986|66.44.XXXX - XXXXX.md.cable.rcn.com


Last edited by Aia; 06-23-2016 at 10:21 AM.. Reason: one line to populate info
# 4  
Old 06-23-2016
Not too elegant with awk as the Address and Creation fields are to be filled into several fields...
Code:
awk '
NR==1                   {HD="Ticket #|Creation Month|Creation Year|Ref|Name|Address |City|State|Zipcode|Phone|Player|DOB|IP"

                         print HD                               # print it
                         sub ("Month\|Creation Year", "Date", HD)
                         sub ("\|City\|State\|Zipcode", _, HD)
                         gsub ("\|", ",", HD)
                         HDCnt = split(HD, HDArr, ",")          # HDArr n HDCnt needed later for extracting and printing
                         HD    = "," HD ","
                        }

function PRT()          {DL = ""                                # clear delimiter

                         for (i=1; i<=HDCnt; i++)       {printf "%s%s", DL, RES[HDArr[i]]       # print fields in sequence, plus delimiter
                                                         DL=SEP                                 # set delimiter
                                                        }
                         printf "\n"
                         delete RES                             # clear for next record
                        }

NF == 0                 {PRT()                                  # empty line means: print complete record
                        }

HD ~ "," $1 ","         {gsub (",", SEP, $NF)                   # prepare Address field
                         if ($1 ~ "Date") sub ("-","|", $NF)    # prepare Creation field
                         RES[$1] = $NF                          # save it for print
                        }

END                  {PRT()}                                    # print last record
' FS="[[:]" SEP="|" file
Ticket #|Creation Month|Creation Year|Ref|Name|Address |City|State|Zipcode|Phone|Player|DOB|IP
 XXXXXXXXXXXXXXXXX| 03|2015| 299XXXXXX| Alexandra| XXXX N 37 Ave| Hollywood| FL| 33021| XXX3190XXX| XXXXXX| June-14-1995| 73.XXXXXXXX - 73.XXXXXXXX
 XXXXXXXXXXXXXXXXX| 03|2016| 449XXXXXX| GXX(1) JonXX(2)| XXXX XXXXXX SE| Washington (city)| DC (state)| 20032 (zipcode)| XXX5954XXX| XXXXXX| April-24-1986| 66.44.XXXX - XXXXX.md.cable.rcn.com

and, the record separator is expected to be one single empty line.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help text manipulation

Hello Forum , I need a help about text manupulation. I have a text file and I have to manipulate this file. Let's say source.txt source.txt UNB+UNOC:3+O0013000005MAN MN RVS:91+0098006688:92+190304:2313+F004169241' UNH+8146848+DELJIT:D:96A:UN' BGM+307:::JIS_SYNCRO_FIRM+2019030423234101+9'... (8 Replies)
Discussion started by: cemokam65
8 Replies

2. UNIX for Dummies Questions & Answers

Text manipulation

i want to generate a list line-by-line of normal characters using letters . for example : dnds gnos mgod pets jnfp etc... i want to use all letters with all the posibilities is there a script that can do this ? (3 Replies)
Discussion started by: suppliernr1
3 Replies

3. UNIX for Dummies Questions & Answers

Text manipulation help

Hello unix.com users, I have a ip file (line-by-line). How can I delete the ips that keep repeating by mark XXX.XXX.XXX.* ... I want to erase only the lines that keep repeating more than 2 times. Example: 1.2.3.1 1.2.3.2 1.2.3.3 I want to erase all ips blocks that are repeating by C... (1 Reply)
Discussion started by: galford
1 Replies

4. UNIX for Dummies Questions & Answers

Text Manipulation Help

Hello Unix.com, I have a text in format: john sara lee How can I make it: john:john john:john1 john:john12 john:john123 sara:sara sara:sara12 sara:sara123 and so on (2 Replies)
Discussion started by: galford
2 Replies

5. UNIX for Dummies Questions & Answers

text manipulation help

Hello again unix.com How can I extract from a large file in format: steve@aol.com steve hawkins Location of this member is bla bla bla sun@hotmail.com Sun Ying This member is using browser bla bla bla to another text in format: steve@aol.com steve hawkins sun@hotmail.com sun ying ... (5 Replies)
Discussion started by: galford
5 Replies

6. Shell Programming and Scripting

[HELP] Text manipulation... [HELP]

I need to know how can I remove all word after comma on each line. Like: jjkj,iiuiui,ijlkjkij,ookoo kijljlj,jhhkj,ijijkijkj,oijkijj kjkljlkj,kjkjlkjlkj,opok,okop to jjkj, kijljlj, ... (5 Replies)
Discussion started by: slutb3
5 Replies

7. UNIX for Dummies Questions & Answers

Text Manipulation

Greetings. Iīm a biologist and I donīt have mucho knowledge on Unix/Linux, but I need to use Cygwin to change some documents from a GenBank format to a FASTA format. GenBank format goes somthing like this: LOCUS NM_013964 2568 bp mRNA linear PRI 26-APR-2009... (2 Replies)
Discussion started by: vanesa1230
2 Replies

8. UNIX for Dummies Questions & Answers

Help with text manipulation

Hi there, I have some text files in unix format that processed by a program in windows, and when I open them with less or vi in linux, a warn for opening binary file is prompted, and as shown in vi, between every two characters there was inserted a "^@". How can I fix this. Plus, there are over... (2 Replies)
Discussion started by: dustinwang2003
2 Replies

9. UNIX for Dummies Questions & Answers

text manipulation

I am tryin to figure out how to extract interested text from file example.txt blah blah blah a: child1 blah a: child2 blah b: parent1 blah blah blah .... blah a: child21 blah a: child22 blah a: child23 blah b: parent2 this kinda text repeats .. number of children is... (6 Replies)
Discussion started by: rajkishore
6 Replies

10. Shell Programming and Scripting

Text Manipulation.

Hi I have only ever used awk and sed for basic requirements up until now. I have had to break a log down for multiple purposes. Using awk, sed and a date script. I am left with this: (message id, time of msg attempt, message id, domain name, time of msg completion) ... (4 Replies)
Discussion started by: Icepick
4 Replies
Login or Register to Ask a Question