Sponsored Content
Top Forums Shell Programming and Scripting Rearrange fields of delimited text file Post 303003162 by durden_tyler on Friday 8th of September 2017 08:23:51 AM
Old 09-08-2017
Quote:
Originally Posted by andy2000
@rdrtx1: your solution doesn't work with more than 2 rows Smilie

a_13;a_2;a_1;a_10
13;2;1;10
23,22,11,100

---------- Post updated at 05:13 AM ---------- Previous update was at 05:10 AM ----------

Sorry, all your solution doesn't work with more than 3 rows:

a_13;a_2;a_1;a_10
13;2;1;10
23,22,11,100
...

Smilie
That's because you changed your delimiter character from ";" to "," (semi-colon to comma) in the last line of your data file.

If I keep the delimiter consistent for all the rows i.e. ";", then it does work:

Code:
$
$ cat data.txt
a_13;a_2;a_1;a_10
13;2;1;10
23,22,11,100
$
$ # Will not work because the last line cannot be split (as per the program's intention) on comma ","
$ perl -F';' -lane 'if ($. == 1){
                        %x = map{ $F[$_] => $_ } (0..$#F);
                        @s = map{ $x{$_} } sort { (split "_", $a)[1] <=> (split "_", $b)[1] } @F;
                    }
                    print join ";", map{ $F[$s[$_]] }(0..$#F);
                   ' data.txt
a_1;a_2;a_10;a_13
1;2;10;13
 
$
$ # After changing data.txt
$
$ cat data.txt
a_13;a_2;a_1;a_10
13;2;1;10
23;22;11;100
$
$ # Will work now because the delimiter is consistent for all rows
$
$ perl -F';' -lane 'if ($. == 1){
                        %x = map{ $F[$_] => $_ } (0..$#F);
                        @s = map{ $x{$_} } sort { (split "_", $a)[1] <=> (split "_", $b)[1] } @F;
                    }
                    print join ";", map{ $F[$s[$_]] }(0..$#F);
                   ' data.txt
a_1;a_2;a_10;a_13
1;2;10;13
11;22;100;23
$
$


Last edited by durden_tyler; 09-08-2017 at 09:38 AM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Sort the fields in a comma delimited file

Hi, I have a comma delimited file. I want to sort the fields alphabetically and again store them in a comma delimited file. For example, My file looks like this. abc,aaa,xyz,xxx,def pqr,ggg,eee,iii,qqq zyx,lmo,pqr,abc,fff and I want my output to look like this, all fields sorted... (3 Replies)
Discussion started by: swethapatil
3 Replies

2. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Discussion started by: axo959
4 Replies

3. Shell Programming and Scripting

Large pipe delimited file that I need to add CR/LF every n fields

I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record. Input file ... (2 Replies)
Discussion started by: clintrpeterson
2 Replies

4. Shell Programming and Scripting

Rearrange the text file

Gents, I have a large file and each line of the file contains more than 200 bytes.Please let me a way to have the new line to start when the word "FIT" appears. I was trialling with 'tr' command but i am not sure how to get it based on bytes and so it wasn't working... Current... (3 Replies)
Discussion started by: appu2176
3 Replies

5. UNIX for Advanced & Expert Users

Problem while counting number of fields in TAB delimited file

I'm facing a strange problem, please help me out. Here we go. I want to count number of fields in particular file. filename and delimiter character will be passed through parameter. On command prompt if i type following i get 27 as output (which is correct) cat customer.dat | head -1 | awk... (12 Replies)
Discussion started by: vikanna
12 Replies

6. Shell Programming and Scripting

Print records which do not have expected number of fields in a comma delimited file

Hi, I have a comma (,) delimited file, in which few fields are enclosed with in double quotes " ". I have to print the records in the file which donot have expected number of field with the line number. File1 ==== name,desgnation,doj,project #header#... (7 Replies)
Discussion started by: machomaddy
7 Replies

7. Shell Programming and Scripting

Split a free form text delimited by space to words with other fields

Hi, I need your help for below with shell scripting or perl I/P key, Sentence customer1, I am David customer2, I am Taylor O/P Key, Words Customer1,I Customer1,am Customer1,David Customer2,I Customer2,am Customer2,Taylor (4 Replies)
Discussion started by: monishathampi
4 Replies

8. Shell Programming and Scripting

Using awk to rearrange fields

Hi, I am required to arrange columns of a file i.e make the 15th column into the 1st column. I am doing awk 'begin {fs=ofs=","} {print $15,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14}' ad.data>ad.csv the problem is that column 15 gets to column 1 but it is not comma separated with the... (10 Replies)
Discussion started by: seddoubt
10 Replies

9. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

10. Shell Programming and Scripting

Pattern Match and Rearrange the Fields in UNIX

For an Output like below Input : <Subject A="I" B="1039502" C="2015-06-30" D="010101010101"> Output : <Subject D="010101010101" B="1039502" C="2015-06-30" A="I"> I have been using something like below but not getting the desired output : awk -F ' ' '/Subject/ BEGIN{OFS=" ";}... (19 Replies)
Discussion started by: arunkesi
19 Replies
LSM(1)							      Latent Semantic Mapping							    LSM(1)

NAME
lsm - Latent Semantic Mapping tool SYNOPSIS
lsm lsm_command [command_options] map_file [input_files] DESCRIPTION
The Latent Semantic Mapping framework is a language independent, Unicode based technology that builds maps and uses them to classify texts into one of a number of categories. lsm is a tool to create, manipulate, test, and dump Latent Semantic Mapping maps. It is designed to provide access to a large subset of the functionality of the Latent Semantic Mapping API, mainly for rapid prototyping and diagnostic purposes, but possibly also for simple shell script based applications of Latent Semantic Mapping. COMMANDS
lsm provides a variety of commands (lsm_command in the Synopsis), each of which often has a wealth of options (see the Command Options below). Command names may be abbreviated to unambiguous prefixes. lsm create map_file input_files Create a new LSM map from the specified input_files. lsm update map_file input_files Add the specified input_files to an existing LSM map. lsm evaluate map_file input_files Classify the specified input_files into the categories of the LSM map. lsm cluster [--k-means=N | --agglomerative=N] [--apply] Compute clusters for the map, and, if the --apply option is specified, transform the map accordingly. Multiple levels of clustering may be applied for faster performance on large maps, e.g. lsm cluster --k-means=100 --each --agglomerative=100 --agglomerative=1000 my.map first computes 100 clusters using (fast) k-means clustering, computes 100 subclusters for each first stage cluster using agglomerative clustering, and finally reduces those 10000 clusters to 1000 using agglomerative clustering. lsm dump map_file [input_files] Without input_files, dumps all words in the map with their counts. With input_files, dump, for each file, the words that appear in the map, their counts in the map, and their relative frequencies in the input file. lsm info map_file Bypass the Latent Semantic Mapping framework to extract and print information about the file and perform a number of consistency checks on it. (NOT IMPLEMENTED YET) COMMAND OPTIONS
This section describes the command_options that are available for the lsm commands. Not all commands support all of these options; each option is only supported for commands where it makes sense. However, when a command has one of these options you can count on the same meaning for the option as in other commands. --append-categories Directs the update command to put the data into new categories appended after the existing ones, instead of adding the data to the existing categories. --categories count Directs the evaluate command to only list the top count categories. --category-delimiter delimiter Specify the delimiter to be used to between categories in the input_files passed to the create and update commands. group Categories are separated by a `;' argument. file Each input_file represents a separate category. This is the default if the --category-delimiter option is not given. line Each line represents a separate category. string Categories are separated by the specified string. --clobber When creating a map, overwrite an existing file at the path, even if it's not an LSM map. By default, create will only overwrite an existing file if it's believed to be an LSM map, which guards against frequent operator errors such as: lsm create /usr/include/*.h --dimensions dim Direct the create and update commands to use the given number of dimensions for computing the map (Defaults to the number of categories). This option is useful to manage the size and computational overhead of maps with large number of categories. --discard-counts Direct the create and update commands to omit the raw word / token counts when writing the map. This results in a map that is more compact, but cannot be updated any further. --hash Direct the create and update commands to write the map in a format that is not human readable with default file manipulation tools like cat or hexdump. This is useful in applications such as junk mail filtering, where input data may contain naughty words and where the contents of the map may tip off spammers what words to avoid. --help List an overview of the options available for a command. Available for all commands. --html Strip HTML codes from the input_files. Useful for mail and web input. Available for the create, update, evaluate, and dump commands. --junk-mail When parsing the input files, apply heuristics to counteract common methods used by spammers to disguise incriminating words such as: Zer0 1nt3rest l0ans Substituting letters with digits W E A L T H Adding spaces between letters m.o.r.t.g.a.g.e Adding punctuation between letters Available for the create, update, evaluate, and dump commands. --pairs If specified with the create command when building the map, store counts for pairs of words as well as the words themselves. This can increase accuracy for certain classes of problems, but will generate unreasonably large maps unless the vocabulary is fairly limited. --stop-words stop_word_file If specified with the create command, stop_word_file is parsed and all words found are excluded from texts evaluated against the map. This is useful for excluding frequent, semantically meaningless words. --sweep-cutoff threshold --sweep-frequency days Available for the create and update commands. Every specified number of days (by default 7), scan the map and remove from it any entries that have been in the map for at least 2 previous scans and whose total counts are smaller than threshold. threshold defaults to 0, so by default the map is not scanned. --text-delimiter delimiter Specify the delimiter to be used to between texts in the input_files passed to the create, update, evaluate, and dump commands. file Each input_file represents a separate text. This is the default if the --text-delimiter option is not given. line Each line represents a separate text. string Texts are separated by the specified string. --triplets If specified with the create command when building the map, store counts for triplets and pairs of words as well as the words themselves. This can increase accuracy for certain classes of problems, but will generate unreasonably large maps unless the vocabulary is fairly limited. --weight weight Scale counts of input words for the create and update commands by the specified weight, which may be a positive or negative floating point number. --words Directs the evaluate or cluster commands to apply to words, instead of categories. --words=count Directs the evaluate command to list the top count words, instead of categories. EXAMPLES
"lsm evaluate --html --junk-mail ~/Library/Mail/V2/MailData/LSMMap2 msg*.txt" Simulate the Mail.app junk mail filter by evaluating the specified files (assumed to each hold the raw text of one mail message) against the user's junk mail map. "lsm dump ~/Library/Mail/V2/MailData/LSMMap2" Dump the words accumulated in the junk mail map and their counts. "lsm create --category-delimiter=group c_vs_h *.c ';' *.h" Create an LSM map trained to distinguish C header files from C source files. "lsm update --weight 2.0 --cat=group c_vs_h ';' ../xy/*.h" Add some additional header files with an increased weight to the training. "lsm create --help" List the options available for the lsm create command. 1.0 2011-11-07 LSM(1)
All times are GMT -4. The time now is 11:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy