Pairing the nth elements on multiple lines iteratively Post: 302951115

Sponsored Content

Top Forums Shell Programming and Scripting Pairing the nth elements on multiple lines iteratively Post 302951115 by John Lyon on Sunday 2nd of August 2015 07:19:46 PM

08-02-2015

Registered User

Thanks for the reply, apologies for being vague, I'm new to all this. To be clear, the following input consists of two example sentences. There are a combined total of 15, curly-bracket enclosed words in the \gla lines of these two examples (3 in the first, 12 in the second):

Code:

\gla {itl\'i\textglotstop } {k\textsuperscript{w}uk\textsuperscript{w}} {t\textschwa cx\textsuperscript{w}\'u\texthalflength\texthalflength y.}//  
\glb {itl\'i\textglotstop } {k\textsuperscript{w}uk\textsuperscript{w}} {tc+\ts x\textsuperscript{w}\'uy}// 
\glc \textsc{dem} \textsc{rep} \textsc{loc}+go // 
\glc from.there they.say came.over.this.way //
\glft `They said he was coming along.' //  

\gla {u\textbeltl } {cut} {k\textsuperscript{w}uk\textsuperscript{w}} {al\'a\textglotstop } {lut} {i\textglotstop } {q\'aqx\textsuperscript{w}\textschwa lx} {ka\textglotstop} {cx\textsuperscript{w}uys} {i\textglotstop } {l} {siw\textbeltl k\textsuperscript{w}.} // 
\glb {u\textbeltl } {cut} {k\textsuperscript{w}uk\textsuperscript{w}} {al\'a\textglotstop } {lut} {i\textglotstop } {q\'a(\tb)\ts qx\textsuperscript{w}lx} {ki\textglotstop} {c\textendash \ts x\textsuperscript{w}uy\textendash s} {i\textglotstop } {l} {siw\textbeltl k\textsuperscript{w}} // 
\glc \textsc{conj} say \textsc{rep} \textsc{dem} \textsc{neg} \textsc{det} fish \textsc{comp.obl} \textsc{cust}\textendash go\textendash \textsc{3sg.poss} \textsc{det} \textsc{loc} water // 
\glc and he.said they.say here no the fish where.that they.come the through water // 
\glft `Coyote said there will be no fish going through the water here.' //

Given this input, this is the initial output I'm looking for:

Code:

{itl\'i\textglotstop } & {itl\'i\textglotstop } & \textsc{dem} & from.there \\
{k\textsuperscript{w}uk\textsuperscript{w}} & {k\textsuperscript{w}uk\textsuperscript{w}} &  \textsc{rep} & they.say \\
{t\textschwa cx\textsuperscript{w}\'u\texthalflength\texthalflength y.} & {tc+\ts x\textsuperscript{w}\'uy} & \textsc{loc}+go &  came.over.this.way \\
{u\textbeltl } & {u\textbeltl } & \textsc{conj} &  and \\
{cut} & {cut} & say & he.said \\
{k\textsuperscript{w}uk\textsuperscript{w}} & {k\textsuperscript{w}uk\textsuperscript{w}} &  \textsc{rep} & they.say \\
{al\'a\textglotstop } &  {al\'a\textglotstop } & \textsc{dem} & here \\
{lut} & {lut} & \textsc{neg} & no \\
{i\textglotstop } & {i\textglotstop } & \textsc{det} & the \\
{q\'aqx\textsuperscript{w}\textschwa lx} & {q\'a(\tb)\ts qx\textsuperscript{w}lx} & fish & fish \\
{ka\textglotstop} & {ki\textglotstop} & \textsc{comp.obl} & where.that \\
{cx\textsuperscript{w}uys} & {c\textendash \ts x\textsuperscript{w}uy\textendash s} &  \textsc{cust}\textendash go\textendash \textsc{3sg.poss} & they.come \\
{i\textglotstop } & {i\textglotstop } & \textsc{det} & the \\
{l} & {l} & \textsc{loc} & through \\
{siw\textbeltl k\textsuperscript{w}} &  {siw\textbeltl k\textsuperscript{w}} & water & water \\

Then, these 15 lines would be sorted, the sort key being the first letter of the first word in each line, so the above 15 lines (corresponding to the total of 15 words in the \gla lines of the two unmodified examples), would be sorted like this:

Code:

{al\'a\textglotstop } &  {al\'a\textglotstop } & \textsc{dem} & here \\
{cut} & {cut} & say & he.said \\
{cx\textsuperscript{w}uys} & {c\textendash \ts x\textsuperscript{w}uy\textendash s} &  \textsc{cust}\textendash go\textendash \textsc{3sg.poss} & they.come \\
{itl\'i\textglotstop } & {itl\'i\textglotstop } & \textsc{dem} & from.there \\
{i\textglotstop } & {i\textglotstop } & \textsc{det} & the \\
{i\textglotstop } & {i\textglotstop } & \textsc{det} & the \\
{ka\textglotstop} & {ki\textglotstop} & \textsc{comp.obl} & where.that \\
{k\textsuperscript{w}uk\textsuperscript{w}} & {k\textsuperscript{w}uk\textsuperscript{w}} &  \textsc{rep} & they.say \\
{k\textsuperscript{w}uk\textsuperscript{w}} & {k\textsuperscript{w}uk\textsuperscript{w}} &  \textsc{rep} & they.say \\
{l} & {l} & \textsc{loc} & through \\
{lut} & {lut} & \textsc{neg} & no \\
{q\'aqx\textsuperscript{w}\textschwa lx} & {q\'a(\tb)\ts qx\textsuperscript{w}lx} & fish & fish \\
{siw\textbeltl k\textsuperscript{w}} &  {siw\textbeltl k\textsuperscript{w}} & water & water \\
{t\textschwa cx\textsuperscript{w}\'u\texthalflength\texthalflength y.} & {tc+\ts x\textsuperscript{w}\'uy} & \textsc{loc}+go &  came.over.this.way \\
{u\textbeltl } & {u\textbeltl } & \textsc{conj} &  and \\

Lines 5/6 and lines 8/9 above are duplicates, so the duplicate entries will be removed from the list, yielding 13 lines:

Code:

{al\'a\textglotstop } &  {al\'a\textglotstop } & \textsc{dem} & here \\
{cut} & {cut} & say & he.said \\
{cx\textsuperscript{w}uys} & {c\textendash \ts  x\textsuperscript{w}uy\textendash s} &  \textsc{cust}\textendash  go\textendash \textsc{3sg.poss} & they.come \\
{itl\'i\textglotstop } & {itl\'i\textglotstop } & \textsc{dem} & from.there \\
{i\textglotstop } & {i\textglotstop } & \textsc{det} & the \\
{ka\textglotstop} & {ki\textglotstop} & \textsc{comp.obl} & where.that \\
{k\textsuperscript{w}uk\textsuperscript{w}} &  {k\textsuperscript{w}uk\textsuperscript{w}} &  \textsc{rep} &  they.say \\
{l} & {l} & \textsc{loc} & through \\
{lut} & {lut} & \textsc{neg} & no \\
{q\'aqx\textsuperscript{w}\textschwa lx} & {q\'a(\tb)\ts qx\textsuperscript{w}lx} & fish & fish \\
{siw\textbeltl k\textsuperscript{w}} &  {siw\textbeltl k\textsuperscript{w}} & water & water \\
{t\textschwa cx\textsuperscript{w}\'u\texthalflength\texthalflength y.}  & {tc+\ts x\textsuperscript{w}\'uy} & \textsc{loc}+go &   came.over.this.way \\
{u\textbeltl } & {u\textbeltl } & \textsc{conj} &  and \\

The result will be an alphabetized vocabulary list, ready to be dropped into a "tabularx" table environment in LaTeX.

I had some luck with danmero's suggestion:

Code:

awk 'END{for(l=1;l++<NF;)print o[l]}{for(l=I;l++<NF;){o[l]=((o[l])?o[l]FS:S)$l}}' file

However, it only worked if (a) all of the extra blank spaces within "words" were removed (since it seems to use blank spaces as a word delimiter), and (b) only one example at a time is modified (since I think it assumes "line names" do not occur multiple times). Both of these issues are my fault, for not being clear during the initial posting about the nature of the data I'm working with. Also, I don't yet know enough about awk to identify what in the above command needs changing. Thanks for your assistance and patience! I hope this helps to clarify.

John Lyon

View Public Profile for John Lyon

Find all posts by John Lyon

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Average of elements throught multiple files

Hi, I got a lot of files looking like this: 1 0.5 6 All together there are ard 1'000'000 lines in each of the ard 100 files. I want to build the average for every line, and write the result to a new file. The averaging should start at a specific line, here for example at line...

2. UNIX for Dummies Questions & Answers

Getting number of lines of nth occurrency

Hi all, I would like to extract the line number of the n-th occurrency of a given string in a file. e.g. xxx yyy xxx zzz xxx the second occurrency of xxx is at line 3. What is the fastest way to do it in bash? Thank you,

3. UNIX for Dummies Questions & Answers

Finding nth line across multiple files

I have several files (around 50) that have the similar format. I need to extract the 5th line from every file and output that into a text file. So far, I have been able to figure out how to do it for a single file: $ awk 'NR==5' text1.txt > results.txt OR $ sed -n '5p' text1.txt > results.txt...

4. Shell Programming and Scripting

Array output through a for loop problematic with multiple elements.

This code works perfect when using a machine with only one interface online. (Excluding the loopback of course) But when I have other interface up for vmware or a vpn the output gets mixed up. I know I had this working when I was just reading ip's from files so I know it is not a problem with...

5. Shell Programming and Scripting

How to output all lines following Nth occurrence of string

Greetings experts. Searched the forums (perhaps not hard enough?) - Am searching for a method to capture all output from a log file following the nth occurrence of a known string. Background: Using bash, I want to monitor my Oracle DB alert log file. The script will count the total # of...

6. Shell Programming and Scripting

Extracting lines after nth LINE from an output

Hi all, Here is my problem for which i am breaking my head for past three days.. I have parted command output as follows.. Model: ATA WDC WD5000AAKS-0 (scsi) Disk /dev/sdb: 500GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type ...

7. UNIX for Dummies Questions & Answers

Rename multiple files in shell bash, changing elements order.

Hi, I want to rename several files like this: example: A0805120817.BHN A0805120818.BHN ..... to: 20120817.0805.N 20120818.0805.N ...... How can i do this via terminal or in shell bash script ? thanks,

8. Shell Programming and Scripting

Removing multiple lines from input file, if multiple lines match a pattern.

GM, I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed. I am assuming that sed, awk or even perl could do what I need. I have an application that adds extra blank page feeds, for multiple reports, when...

9. UNIX for Dummies Questions & Answers

Getting the lines with nth column non-null

Hi, I have a huge list of archives (.gz). Each archive is about 40MB. A file is generated every minute so if I want to analyze the data for 1 hour I get already 60 files for example. These are text files, ';' separated, each line having about 300 fields (columns). What I need to do is to...

10. Shell Programming and Scripting

Compare multiple arrays elements using awk

I need your help to discover missing elements for each box. In theory each box should have 4 items: ITEM01, ITEM02, ITEM08, and ITEM10. Some boxes either have a missing item (BOX02 ITEM08) or might have da duplicate item (BOX03 ITEM02) and missing another one (BOX03 ITEM01). file01.txt ...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Average of elements throught multiple files

Discussion started by: chillmaster

2. UNIX for Dummies Questions & Answers

Getting number of lines of nth occurrency

Discussion started by: f_o_555

3. UNIX for Dummies Questions & Answers

Finding nth line across multiple files

Discussion started by: oriqin

4. Shell Programming and Scripting

Array output through a for loop problematic with multiple elements.

Discussion started by: Azrael

5. Shell Programming and Scripting

How to output all lines following Nth occurrence of string

Discussion started by: cjtravis

6. Shell Programming and Scripting

Extracting lines after nth LINE from an output

Discussion started by: selvarajvs

7. UNIX for Dummies Questions & Answers

Rename multiple files in shell bash, changing elements order.

Discussion started by: pintolcv

8. Shell Programming and Scripting

Removing multiple lines from input file, if multiple lines match a pattern.

Discussion started by: jxfish2

9. UNIX for Dummies Questions & Answers

Getting the lines with nth column non-null

Discussion started by: Nenad

10. Shell Programming and Scripting

Compare multiple arrays elements using awk

Discussion started by: alex2005