Hello,
Is there anyway that I can align a pipe delimited text file by the maxium field length where the field is separated out by pipes for large text files with more than
100,000 rows?
So, far I have searched other forums and google about aligning text files in unix and I have noticed that several other users use the awk utility. Since I am new to awk
I have attempted in writing my own code after reading some of the awk utility syntax, but I am getting stuck.
If awk is not the best utility to achieve this is there any way to code this???
My test code:
#!/bin/ksh
awk 'BEGIN {FS = "|"}
{
for(i=1;i<=NF;i++)
{
if (length($i) > max)
max = length($i)
maxlen($i) = max
}
}
END
{
for (i in max) print (i,max)
}
' $(find . -name "testfile.txt")
Below is a sample of the text file that I have:
Pipe Delimited Text file
YEAR|NAME|PRODUCT_ID|ORDER_ID|CUSTOMER_ID
2001|Unix book|12354|01587|5487651484
2002|Programming|65487|6564548|654365146
2003|Airsoft Guns|6544888|548|65498
2004|Video Games|101100018|44|648
2010|Wayside Stories from wayside school|5487454|4|64645646
.
.
.
Hi all,
I tried to write a shell to read huge file and eliminate max length record which is wrong generated record. But I get an error
remove_sp.sh: line 27: syntax error near unexpected token `else'
remove_sp.sh: line 27: ` else $LINE >> REJFILE'
My shell is here:
#!/bin/sh... (5 Replies)
Hi all,
I have a flat file of 1000 rows. I want to check the length of the 5th column. The one having the longest length , I want to set it as DEFINED PARAMETER.
So later I can check others with that particular number only.
Any ideas ?? (2 Replies)
Hi,
This is my first post to this site. So kindly forgive if I am writing in a wrong section.
My query is that...
I want to modify the max username length size. I guess it is 32/64 on CentOS. Now I want to change it to 128. Is there any way to do that?
Thanks in advance!! :) (4 Replies)
Hey Any one...
Do u know any way I can modify the max username length in unix? I guess it is 32/64 characters by default. Suppose I want to increase it to 128.
i hav tried /etc/skel
but no use...
How can I do that? (2 Replies)
Hello Everyone,
I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table.
Input Data-
------ ------------------------ ---- -----------------
WFI001 Xxxxxx Control Work Item A Number of Records
------ ------------------------... (5 Replies)
Hi All,
I am new to perl and was trying to write a simple program which will generate a text file as output..
now the output which i am getting is something like this..
==================================================================================================
Col1 ... (8 Replies)
Hi,
I have a huge file that has data something like shown below:
huge_file.txt
start regexp
Name=Name1
Title=Analyst
Address=Address1
Department=Finance
end regexp
some text
some text
start regexp
Name=Name2
Title=Controller
Address=Address2
Department=Finance
end regexp (7 Replies)
hai guys,
pick the 1st field and calculate max length.
if suppose max length is 2,
then compare the all records if <2 then add zero's prefix of the record.
for ex:
s.no,sname
1,djud
37,jtuhe
in this max length of the 1st field is 2 right
the output wil be
s.no,sname
01,djud... (6 Replies)
Discussion started by: Suneelbabu.etl
6 Replies
LEARN ABOUT DEBIAN
tabmerge
TABMERGE(1p) User Contributed Perl Documentation TABMERGE(1p)NAME
tabmerge - unify delimited files on common fields
SYNOPSIS
tabmerge [action] [options] file1 file2 [...]
Actions:
--min Take only fields present in all files [DEFAULT]
--max Take all fields present
-f|--fields=f1[,f2] Take only the fields mentioned in the
comma-separated list
Options:
-l|--list List available fields
--fs=x Use "x" as the field separator
(default is tab " ")
--rs=x Use "x" as the record separator
(default is newline "
")
-s|--sort=f1[,f2] Sort data ASCII-betically on field(s)
--stdout Print data in original delimited format
(i.e., not in a table format)
--help Show brief help and quit
--man Show full documentation
DESCRIPTION
This program merges the fields -- not the rows -- of delimited text files. That is, if several files are almost but not quite entirely
unlike each other in their structure (in their field names, numbers or orders), this script allows you to easily unify the files into one
file with all the same fields. The output can be based on fields as determined by the three "action" flags.
For the following examples, consider three files that contain the following fields:
+------------+---------------------------------+
| File | Fields |
+------------+---------------------------------+
| merge1.tab | name, type, position |
| merge2.tab | name, type, position, lod_score |
| merge3.tab | name, position |
+------------+---------------------------------+
To list all available fields in the files and the number of times they are present:
$ tabmerge --list merge*
+-----------+-------------------+
| Field | No. Times Present |
+-----------+-------------------+
| lod_score | 1 |
| name | 3 |
| position | 3 |
| type | 2 |
+-----------+-------------------+
To merge the files on the minimum overlapping fields:
$ tabmerge merge*
+----------+----------+
| name | position |
+----------+----------+
| RM104 | 2.30 |
| RM105 | 4.5 |
| TX5509 | 10.4 |
| UU189 | 19.0 |
| Xpsm122 | 3.3 |
| Xpsr9556 | 4.5 |
| DRTL | 2.30 |
| ALTX | 4.5 |
| DWRF | 10.4 |
+----------+----------+
To merge the files and include all the fields:
$ tabmerge --max merge*
+-----------+----------+----------+--------+
| lod_score | name | position | type |
+-----------+----------+----------+--------+
| | RM104 | 2.30 | RFLP |
| | RM105 | 4.5 | RFLP |
| | TX5509 | 10.4 | AFLP |
| 2.4 | UU189 | 19.0 | SSR |
| 1.2 | Xpsm122 | 3.3 | Marker |
| 1.2 | Xpsr9556 | 4.5 | Marker |
| | DRTL | 2.30 | |
| | ALTX | 4.5 | |
| | DWRF | 10.4 | |
+-----------+----------+----------+--------+
To merge and extract just the "name" and "type" fields:
$ tabmerge -f name,type merge*
+----------+--------+
| name | type |
+----------+--------+
| RM104 | RFLP |
| RM105 | RFLP |
| TX5509 | AFLP |
| UU189 | SSR |
| Xpsm122 | Marker |
| Xpsr9556 | Marker |
| DRTL | |
| ALTX | |
| DWRF | |
+----------+--------+
To merge the files on just the "name" and "lod_score" fields and sort on the name:
$ tabmerge -f name,lod_score -s name merge*
+----------+-----------+
| name | lod_score |
+----------+-----------+
| ALTX | |
| DRTL | |
| DWRF | |
| RM104 | |
| RM105 | |
| TX5509 | |
| UU189 | 2.4 |
| Xpsm122 | 1.2 |
| Xpsr9556 | 1.2 |
+----------+-----------+
To do the same but mimic the original tab-delimited input:
$ tabmerge -f name,lod_score -s name --stdout merge*
name lod_score
ALTX
DRTL
DWRF
RM104
RM105
TX5509
UU189 2.4
Xpsm122 1.2
Xpsr9556 1.2
Why would you want to do this? Suppose you have several delimited text files with nearly the same structure and want to create just one
file from them, but the fields may be in a different order in each file and/or some files may contain more or fewer fields than others.
(As far-fetched as it may seem, it happens to the author more than he'd like.)
SEE ALSO
o Text::RecordParser
o Text::TabularDisplay
AUTHOR
Ken Youens-Clark <kclark@cpan.org>.
LICENSE AND COPYRIGHT
Copyright (C) 2006-10 Ken Youens-Clark. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
perl v5.10.1 2010-07-26 TABMERGE(1p)