@ctsgnb
That might work, but again, several of the files have different text contained in the first part of each document.
How do I write the output from the paste command to a file?
---------- Post updated at 05:11 PM ---------- Previous update was at 04:50 PM ----------
@shamrock, I tried your script and the output is marvellous, but it is not matching all the documents in each file I'm working on.
The problem is that each file has information entered in different ways. In one file, the word edition comes before the date, so your script stops printing before the crucial information, the date.
The only thing that unifies all the files is that each section is separated by the phrase
[0-9] of [0-9] DOCUMENTS, followed by some information, followed by a date. The only way to match that text is to grep a few lines after the Documents phrase, transpose the rows, and then play around with the columns I've got (probably in Excel) to get all the dates lined up. I was really very close with my original script, but it would just output everything on one line, rather than on one line for each record (defined as the space between the DOCUMENTS string.
---------- Post updated at 10:40 PM ---------- Previous update was at 05:11 PM ----------
The problem is really with the newline. I can get all the information I need out of every file, but awk is just outputting everything onto one line
Quote:
1 of 28 DOCUMENTS Copyright 2010 Dagbladet Politiken All Rights Reserved ......ds -- 2 of 28 DOCUMENTS Copyright 2010 Dagbladet Politiken
And where it encounters the new record string that I define at the beginning of the script, it inserts a double dash (see right before 2 of 28) in the output, but it won't print them on two lines.
If I can just break up that output into one record per line, which I thought awk was supposed to do, I'm off to the races...