Rearrange fields of delimited text file Post: 303003131

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Sort the fields in a comma delimited file

Hi, I have a comma delimited file. I want to sort the fields alphabetically and again store them in a comma delimited file. For example, My file looks like this. abc,aaa,xyz,xxx,def pqr,ggg,eee,iii,qqq zyx,lmo,pqr,abc,fff and I want my output to look like this, all fields sorted...

2. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,...

3. Shell Programming and Scripting

Large pipe delimited file that I need to add CR/LF every n fields

I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record. Input file ...

4. Shell Programming and Scripting

Rearrange the text file

Gents, I have a large file and each line of the file contains more than 200 bytes.Please let me a way to have the new line to start when the word "FIT" appears. I was trialling with 'tr' command but i am not sure how to get it based on bytes and so it wasn't working... Current...

5. UNIX for Advanced & Expert Users

Problem while counting number of fields in TAB delimited file

I'm facing a strange problem, please help me out. Here we go. I want to count number of fields in particular file. filename and delimiter character will be passed through parameter. On command prompt if i type following i get 27 as output (which is correct) cat customer.dat | head -1 | awk...

6. Shell Programming and Scripting

Print records which do not have expected number of fields in a comma delimited file

Hi, I have a comma (,) delimited file, in which few fields are enclosed with in double quotes " ". I have to print the records in the file which donot have expected number of field with the line number. File1 ==== name,desgnation,doj,project #header#...

7. Shell Programming and Scripting

Split a free form text delimited by space to words with other fields

Hi, I need your help for below with shell scripting or perl I/P key, Sentence customer1, I am David customer2, I am Taylor O/P Key, Words Customer1,I Customer1,am Customer1,David Customer2,I Customer2,am Customer2,Taylor

8. Shell Programming and Scripting

Using awk to rearrange fields

Hi, I am required to arrange columns of a file i.e make the 15th column into the 1st column. I am doing awk 'begin {fs=ofs=","} {print $15,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14}' ad.data>ad.csv the problem is that column 15 gets to column 1 but it is not comma separated with the...

9. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as...

10. Shell Programming and Scripting

Pattern Match and Rearrange the Fields in UNIX

For an Output like below Input : <Subject A="I" B="1039502" C="2015-06-30" D="010101010101"> Output : <Subject D="010101010101" B="1039502" C="2015-06-30" A="I"> I have been using something like below but not getting the desired output : awk -F ' ' '/Subject/ BEGIN{OFS=" ";}...

LEARN ABOUT DEBIAN

msort

MSORT(1)							   User Commands							  MSORT(1)

NAME

       msort - sort records in complex ways

SYNOPSIS

       msort <options> [<input file>]

DESCRIPTION

       msort is a program for sorting text files in sophisticated ways.  It was developed initially for alphabetizing dictionaries of languages in
       which the ordering may be quite different from English but has many other uses.

       msort allows you to sort blocks of text delimited in a number of ways rather than just lines and to specify particular fields of  a  record
       as sort keys using either their position, counted from either end, or by matching regular expressions to their tags.

       msort  is  capable  of sorting on multiple keys, so that when two records tie on one key, the tie may be broken on another. Any or all keys
       may be optional.  How absent optional keys are ordered with respect to present keys may be set separately for each key.

       msort allows you to specify arbitrary sort orders and to define virtually unlimited numbers of multigraphs of effectively unlimited length.
       The  sort  order  and multigraphs are defined separately for each key. If your system has locale support, you can also use locale collation
       rules instead of specify your own sort order.

       msort provides twelve types of key comparison: lexicographic, numeric, numeric string, hybrid, by string length,  by  angle,  by  date,	by
       domain name, by time, by ISO8601 date/time stamp, by month name, and random.

       What  month  names  are used is a bit complicated. If the -s flag is used on the same key and its argument is the name of a file, the month
       names are read from the file, which should be in the same format as a sort order definition file. If the -s flag is used and  its  argument
       is a locale name, the month names recognized will be the month names and abbreviations associated with the specified locale. If the -s flag
       is not used the month names recognized will be the month names and abbreviations associated with the current locale. If	your  system  does
       not  have locale support and you do not use the -s flag to read the month names from a file, the month names recognized will be the English
       month names and abbreviations.

       msort can reverse the characters in a key, allowing it to be used to generate reverse dictionaries.

       A choice of sorting algorithms is provided.

       msort fully supports Unicode. The text to be sorted, and all specifications, should be in UTF-8 Unicode. (If you  have  plain  ASCII  text,
       this  is not a problem as ASCII is a subset of Unicode.) Full Unicode case-folding is available, in Turkic and non-Turkic variants. Unicode
       normalization is performed before sorting.

       For usage information, execute msort with no arguments.

       Full information about msort is currently to be found in the reference manual, which is distributed as a  PDF  (Portable  Document  Format)
       file. If a copy is not available locally, you can download it from msort's home page:
       http://billposer.org/Software/msort.html

OPTIONS

   Informational options
       -h,--help
	      Print usage message

       -v,--version
	      Print version message

       -D,--defaults
	      List defaults

       -F,--general-options
	      List general command line options

       -G,--gnu-equivalences
	      List equivalents for GNU sort command line options.

       -H,--informational-options
	      List informational command line options

       -K,--key-specific-options
	      List key-specific command line options

       -L,--limits
	      List limits

       -N,--number-systems
	      List the supported number systems.

   General options
       -b,--block
	      A record is terminated by two or more newlines

       -l,--line
	      A record consists of a single line

       -r,--record-separator <separator>
	      A record is terminated by separator character

       -O,--fixed-size-record <bytes>
	      A record consists of the specified number of bytes.

       -d,--field-separators <character>+
	      Fields are delimited by the named character(s)

       -w,--whole
	      Sort on the entire text of the record

       -a,--algorithm <algorithm>
	      Use the specified sort algorithm. The choices are: I(nsertionSort), M(ergeSort), Q(uickSort), and S(hellSort).  Note that Insertion-
	      Sort and MergeSort are stable, while QuickSort and ShellSort are unstable. The default is QuickSort.

       -M,-initial-maximum-records <records>
	      Set initial maximum number of records

       -m,--line-end-carriage-return
	      End-of-line in the input data is marked by Carriage Return (0x0D) as on the Macintosh rather than by Line Feed  (0x0A)  as  on  Unix
	      systems.

       -I,--invert-globally
	      Invert sense of comparisons globally

       -B,--BMP
	      No characters fall outside the Basic Multingual Plane (that is, have values greater than 0xFFFF).

       -p,--reserve-private-use-area
	      Do  not  make internal use of the Private Use areas. By default, multigraphs are assigned internally to codepoints in the Supplemen-
	      tary Private Use areas if full Unicode is in use or to codepoints in the Private Use area if input is restricted to the Basic Multi-
	      lingual  Plane  by  means of the -B option. If your input makes use of the Private Use areas, this option prevents interference with
	      your input. In this case, multigraphs will be assigned to the Low and High Surrogate areas (0xD800-0xDFFF). Note	that  this  limits
	      the number of multigraphs to 2,048.

       -P,--random-seed <seed>
	      Set  the	seed  for  the	random	number	generator.  If not set here, it is set to a value determined by the time. The seed used is
	      reported in the log. This option allows runs to be replicated.

       -Q,--check-only
	      Check whether the input is already sorted. Do not generate any output.  Exit status is 0 if input  is  already  sorted,  11  if  not
	      sorted.

       -1,--in <input file name>

       -2,--out <output file name>
	      If  the output file is the same as the input file, the input file will be overwritten. The input file will not be overwritten if the
	      run is unsuccessful.

       -j,--suppress-log
	      Suppress output to the log. If this flag is given before there is any output to the log from a command line flag,  nothing  will	be
	      written  to  the	log  and the log file will not be created. If a command line flag generates a log message before this flag is pro-
	      cessed, the log file will be created but no log messages will be written to it once this flag is processed.  To  guarantee  that	no
	      attempt will be made to open a log file, give this flag first.

       -q,--quiet
	      Be quiet - do not chat while working

       -u,--unicode-normalization <mode>
	      Select  Unicode normalization mode. The choices of mode are: c for normalization form C (NFC), d for normalization form D (NFD), and
	      n for no normalization. The default is NFC.

   Key specific options
       -e,--character-range <m,n>
	      Sort on characters m through n. Positive indices start from one.	Negative indices indicate position with respect to the end of  the
	      record.  For example, the range 3,-2 consists of the third character through the next-to-last character.

       -n,--position <POS>(,<POS>)
	      Sort  on	the specified POS or contiguous range of POSs, where a POS is of the form <field number>(.<character number>). Both counts
	      begin at one.  Field numbers but not character numbers may be negative, in which case they are counted from the right. Thus, 1.2	is
	      the second character of the first field; -2.1 is the first character of the next to last field.

       -t,--tag <tag regexp>
	      Sort on the field with the specified tag

       -o,--optional <comparison>
	      Optional: compare as (<,=,>) to present key if absent

       -C,--fold-case
	      Fold case

       -z,--fold-case-turkic
	      Fold case with additional Turkic conversions.

       -c,--comparison-type <comparison type>
	      a(ngle),l(exicographic),	i(so8601  date/time),  t(ime),	D(omain  name/email  address),	d(ate),  m(onth name), n(umeric), N(umeric
	      string),s(ize), h(hybrid), r(andom)

       -y,--number-system <number system>
	      Specifies the number system expected for this key. This affects only numeric and numeric string keys. There are two special  values.
	      If the number system is "all", records may contain any number system that msort can interpret. Different records may contain differ-
	      ent number systems.  If the number system is "any", records may contain any writing system that msort can interpret, but all records
	      must make use of the same number system.	msort sets the number system on the basis of the first record.

       -f,--date-format <date format>
	      Permutation of ymd with separators, e.g. y-m-d for international date format, m/d/y for American date format, or a permutation of yd
	      with separators, e.g. y-d, for day-of-year dates. All three components may be numbers in any  available  number  system.	The  month
	      field may also be a month name, determined by the same devices as independent month name fields.

       -W,--sort-order-file-separators <file name>
	      Read the list of characters to be treated as separators in the sort order definition file.

       -S,--substitutions <file name>
	      Read substitutions from named file

       -s,--sort-order <file name>|<locale name>|"locale"
	      If  the  argument  is  a file name, it is taken to be a sort order file and the sort order for the key is read from the file. If the
	      argument is a locale name, the collation rules for that locale are used. If the argument is "locale", the collation  rules  for  the
	      current locale are used.

       -T,--transformations <(d)(e)(s)>
	      Apply  the  specified  transformations.  d specifies that diacritics are to be stripped. Separately encoded combining diacritics are
	      removed. Characters with diacritics represented by    single codepoints are replaced with the corresponding ASCII character  without
	      the diacritics, if there is one.	e specifies that enclosed characters, that is, characters within circles or parentheses, are to be
	      replaced with the corresponding plain ASCII character if there is one.  s specifies that characters in  special  styles  are  to	be
	      replaced	with the corresponding plain ASCII character if there is one. Stylistic equivalents include: small capitals (e.g. U+1D04),
	      script forms (e.g. U+212C), black letter forms (e.g. U+212D), Arabic presentation forms (e.g.  U+FE81),  Hebrew  presentation  forms
	      (e.g.  U+FB1D),  fullwidth  forms  (e.g.	U+FF01),  halfwidth  forms  (e.g. U+FF7B), and the mathematical alphanumeric symbols (e.g.
	      U+1D400).

       -x,--exclusion-file <file name>
	      Read exclusions from named file

       -X,--exclude-characters <exclusions>
	      Exclude specified characters

       -i,--invert-locally
	      Invert sense of comparisons

       -R,--reverse-key
	      Reverse characters of key

       -A,--first-character-only
	      Ignore all but the first character of the field, after substitutions, exclusions, etc.

       Note: long options may not be available on your system.

SEE ALSO

       sort(1), uninum(3)

AUTHOR

       Bill Poser (billposer@alum.mit.edu)

LICENSE

       GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3.

msort								   October 2008 							  MSORT(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Sort the fields in a comma delimited file

Discussion started by: swethapatil

2. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

Discussion started by: axo959

3. Shell Programming and Scripting

Large pipe delimited file that I need to add CR/LF every n fields

Discussion started by: clintrpeterson

4. Shell Programming and Scripting

Rearrange the text file

Discussion started by: appu2176