Best way to sort file with groups of text of 4-5 lines by the first one


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Best way to sort file with groups of text of 4-5 lines by the first one
# 1  
Old 05-08-2018
Hammer & Screwdriver Best way to sort file with groups of text of 4-5 lines by the first one

Hi, I have some data I have taken from the internet in the following scheme:

Code:
name
direction
webpage
phone number
open hours
menu url
book url

name
...

Of course the only line that is mandatory is the name wich is the one I want to sort by.
I have the following sed & awk script that its working but I would like to know my mistakes or if there is another (better) way:

Code:
#!/bin/sh
set  -e

rnb=/tmp/res_no_blank
rns=/tmp/res_names_sorted

[ $# -ne 1 ] &&  echo "need file as argument" && exit 1
[ ! -s "$1" ] && echo "file is empty" && exit 1

# Delete duplicate empty lines
awk '/^$/{ if (! blank++) print; next } { blank=0; print }' "$1" > "$rnb"

# Sort name of restaurant
awk '/^$/{getline; if($0 != null) print $0}' "$rnb" | sort | uniq > "$rns"

while read rest
do
        sed -n -E '/'"$rest"'/,/^$/p' "$1" | sed 's/Cerrado hoy/'$(date +%a)' cerrado/'
done < "$rns" > "$1.sorted"

sample of the data

Code:
Cardamomo Tablao Flamenco
Calle Echegaray, 15, 28014 Madrid
cardamomo.com
918 05 10 38
reservas: https://cardamomo.com/es/comprar-entradas-flamenco/?utm_source=google%20my%20business&utm_medium=google%2B&utm_campaign=link%20a%20comprar%20entradas

Vermú
Calle de Jesús, 6, 28014 Madrid
914 21 55 65
Cerrado hoy

Rodilla
Calle de Alcalá, nº 67, local Izquierdo, 28014 Madrid
rodilla.es
917 55 53 22
 8:00–21:30

El Patio Vertical
Calle de Almadén, 26, 28014 Madrid
elpatiovertical.es
914 20 16 63
 8:30–21:00

Restaurante La Tragantua
Calle de la Verónica, 4, 28014 Madrid
latragantua.es


Last edited by devmsv; 05-08-2018 at 07:58 AM.. Reason: copy/paste mess
# 2  
Old 05-08-2018
Welcome to the forum.

Not sure I understood your script to its entirety, but for the sorting you could make use of an awk feature. man awk:
Quote:
Multi-line records
Since mawk interprets RS as a regular expression, multi-line records are easy. Setting RS = "\n\n+", makes one or more blank lines separate records.
Applying this to your data sample, how close would this be:
Code:
awk '$1=$1' RS= FS="\n" OFS="\t" file | sort
Cardamomo Tablao Flamenco	Calle Echegaray, 15, 28014 Madrid	cardamomo.com	918 05 10 38	reservas: https://cardamomo.com/es/comprar-entradas-flamenco/?utm_source=google%20my%20business&utm_medium=google%2B&utm_campaign=link%20a%20comprar%20entradas
El Patio Vertical	Calle de Almadén, 26, 28014 Madrid	elpatiovertical.es	914 20 16 63	 8:30–21:00
Restaurante La Tragantua	Calle de la Verónica, 4, 28014 Madrid	latragantua.es
Rodilla	Calle de Alcalá, nº 67, local Izquierdo, 28014 Madrid	rodilla.es	917 55 53 22	 8:00–21:30
Vermú	Calle de Jesús, 6, 28014 Madrid	914 21 55 65	Cerrado hoy

This User Gave Thanks to RudiC For This Post:
# 3  
Old 05-08-2018
Quote:
Originally Posted by RudiC
Welcome to the forum.

Not sure I understood your script to its entirety, but for the sorting you could make use of an awk feature. man awk:Applying this to your data sample, how close would this be:
Code:
awk '$1=$1' RS= FS="\n" OFS="\t" file | sort
Cardamomo Tablao Flamenco    Calle Echegaray, 15, 28014 Madrid    cardamomo.com    918 05 10 38    reservas: https://cardamomo.com/es/comprar-entradas-flamenco/?utm_source=google%20my%20business&utm_medium=google%2B&utm_campaign=link%20a%20comprar%20entradas
El Patio Vertical    Calle de Almadén, 26, 28014 Madrid    elpatiovertical.es    914 20 16 63     8:30-21:00
Restaurante La Tragantua    Calle de la Verónica, 4, 28014 Madrid    latragantua.es
Rodilla    Calle de Alcalá, nº 67, local Izquierdo, 28014 Madrid    rodilla.es    917 55 53 22     8:00-21:30
Vermú    Calle de Jesús, 6, 28014 Madrid    914 21 55 65    Cerrado hoy

sorry I messed the script copy pasting.
Will try your solution (and try to understand it), looks it will fit better my goals

---------- Post updated at 01:32 PM ---------- Previous update was at 01:02 PM ----------

As always man pages have the tips:
Code:
The input is normally made up of input lines (records) separated by
     newlines, or by the value of RS.  If RS is null, then any number of blank
     lines are used as the record separator, and newlines are used as field
     separators (in addition to the value of FS).  This is convenient when
     working with multi-line records.

What I don't understand its why its needed to do '$1=$1'
# 4  
Old 05-08-2018
man awk:
Quote:
Assignment to $0 causes the fields and NF to be recomputed. Assignment to NF or to a field causes $0 to be reconstructed by concatenating the $i's separated by OFS.
So - $1 is assigned to, but without modification, and the new OFS (<TAB>, \t) replaces the old one (<new line>, \n) to result in a new, one line record prepared for sorting.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match text to lines in a file, iterate backwards until text or text substring matches, print to file

hi all, trying this using shell/bash with sed/awk/grep I have two files, one containing one column, the other containing multiple columns (comma delimited). file1.txt abc12345 def12345 ghi54321 ... file2.txt abc1,text1,texta abc,text2,textb def123,text3,textc gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies

2. Shell Programming and Scripting

Sort html based on .jar, .war file names and still keep text within three groups.

Output from zipdiff GNU EAR comparison tool produces output in html divided into three sections "Added, Removed, Changed". I want the output to be sorted by jar or war file. <html> <body> <table> <tr> <td class="diffs" colspan="2">Added </td> </tr> <tr><td> <ul>... (5 Replies)
Discussion started by: kchinnam
5 Replies

3. UNIX for Dummies Questions & Answers

Extracting lines from a text file based on another text file with line numbers

Hi, I am trying to extract lines from a text file given a text file containing line numbers to be extracted from the first file. How do I go about doing this? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

4. UNIX for Dummies Questions & Answers

gawk asort to sort record groups based on one subfield

input ("/" delimited fields): style1/book1 (author_C)/editor1/2000 style1/book2 (author_A)/editor2/2004 style1/book3 (author_B)/editor3/2001 style2/book8 (author_B)/editor4/2010 style2/book5 (author_A)/editor2/1998 Records with same field 1 belong to the same group. Using asort (not sort),... (3 Replies)
Discussion started by: lucasvs
3 Replies

5. Shell Programming and Scripting

sort each column of text file alone

Hello , i have a text file like this 1 a1 ,AB ,AC ;AD ,EE 2 a2 ,WE ;TR ,YT ,WW 3 a3 ;AS ,UY ;RF ,YT i want to sort this text file based on each row , and excluding 2nd column from the sorting and not taking the comma or ; into consideration in the sorting, so it will become like this... (12 Replies)
Discussion started by: shelladdict
12 Replies

6. UNIX for Dummies Questions & Answers

sort the file lines according to second column .. /bin/sh

the rows have to be sorted according to the second column (family name). In addition, the number of family names that start with a particular alphabet character is put before the beginning of the series. Assume that the file have 4 columns with the following column formatting: first name... (5 Replies)
Discussion started by: maga6610
5 Replies

7. Shell Programming and Scripting

sort text file

HI all i have a text file file1 like this 004002004545454000001 041002004545222000002 006003008751525000003 007003008751352000004 006003008751142000005 004001005745745000006 i want to sort the file according to position 1-5 and secondary sort by the last position of file 16-21... (4 Replies)
Discussion started by: naamas03
4 Replies

8. Shell Programming and Scripting

How to sort a file and then print similar lines once

Hi! I have a trouble with the sort and the uniq. I know I have to use them, I just have trouble with putting them in the right order. I have a text file with unsorted lines (each line has a few words, the first word in the line is a number.). I need to sort this file in order to be... (6 Replies)
Discussion started by: shira
6 Replies

9. Shell Programming and Scripting

Need Help to sort text lines

I need to sort input file as below to display as below: input.txt User: my_id File: oracle/scripts/ssc/ssc_db_info User: your_id File: pkg_files/BWSwsrms/request User: your_id File: pkg_files/BWSwsco/checkConfig.sh OUTPUT: User: my_id File: ... (3 Replies)
Discussion started by: tqlam
3 Replies

10. Shell Programming and Scripting

Need a Help with sort a text file with some fields

Ive got a file called listacdrs with this structure: 01/09/2006 12:13 p.m. 1.046.528 CF0155.DAT 01/09/2006 12:13 p.m. 1.046.528 CF0156.DAT 01/09/2006 12:13 p.m. 1.046.528 CF0157.DAT 01/09/2006 12:13 p.m. 1.046.528 CF0158.DAT 01/09/2006 12:14 p.m. ... (3 Replies)
Discussion started by: alexcol
3 Replies
Login or Register to Ask a Question