Get column number with the same character length


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Get column number with the same character length
# 1  
Old 12-24-2015
Get column number with the same character length

so, i want to tail about the last 3000 lines of a log file and find the column that has the same number of characters across all 3000 lines (or most of the 3000 lines)

Code:
tail -3000 logfile | while read line
do
        ColumnCharCount=$(for eachwordorwhatever in "${echo  $line}"
        do
                  charcount=$(echo $eachwordorwhatever | awk '{print length}')
                  echo ${eachwordorwhatever}=${charcount}
        done)
        echo ${ColumnCharCount}
done

this is as far as i've gotten so far before i realized this is turning out to be very inefficient.

Any awk solutions for this?

the complete code, when completed, should tell me "column X has X number of the same characters across X lines".

the log output can be any format. but for the purposes of this thread, let's assume each line in the log is space delimited.
# 2  
Old 12-24-2015
Your specification is a little bit too vague for me to understand what you're trying to do. And the sample script you have shown us doesn't seem to be doing what you're describing.

Why don't you give us a much smaller sample log file (maybe about 20 lines) and you show us exactly what output you are hoping to produce from that input? And then, please explain how the sample output you will have shown us is derived from the sample log file input you will have shown us?
# 3  
Old 12-24-2015
sorry, i didn't post the log output because the log really does not matter as i intend to use it on log of any format.

code that does something close to what i need (i added NF to also show me the number of columns of each line):

Code:
logfile=$1

tail -3000 $logfile | while read line
do
        ColumnCharCount=$(for eachwordorwhatever in $(echo $line)
        do
                  charcount=$(echo $eachwordorwhatever | awk '{print length,NF}')
                  echo ${eachwordorwhatever}=${charcount}
        done)
        echo ${ColumnCharCount}
done

sample log:

Code:
Dec=3 21=2 11:13:59=8 jserver-VirtualBox=17 kernel:=7 1=1 VbglR0HGCMInternalCall:=23 vbglR0HGCMInternalDoCall=24 failed.=7 rc=-2=5
Dec=3 21=2 11:13:59=8 jserver-VirtualBox=17 kernel:=7 1=1 VBOXGUEST_IOCTL_HGCM_CALL:=26 64=2 Failed.=7 rc=-2.=6
Dec=3 21=2 12:09:28=8 jserver-VirtualBox=17 kernel:=7 1=1 VbglR0HGCMInternalCall:=23 vbglR0HGCMInternalDoCall=24 failed.=7 rc=-2=5
Dec=3 21=2 12:09:28=8 jserver-VirtualBox=17 kernel:=7 1=1 VBOXGUEST_IOCTL_HGCM_CALL:=26 64=2 Failed.=7 rc=-2.=6
Dec=3 21=2 12:39:44=8 jserver-VirtualBox=17 kernel:=7 1=1 8=1 VbglR0HGCMInternalCall:=23 vbglR0HGCMInternalDoCall=24 failed.=7 rc=-2=5
Dec=3 21=2 12:39:44=8 jserver-VirtualBox=17 kernel:=7 1=1 8=1 VBOXGUEST_IOCTL_HGCM_CALL:=26 64=2 Failed.=7 rc=-2.=6
Dec=3 21=2 14:22:27=8 jserver-VirtualBox=17 kernel:=7 1=1 VbglR0HGCMInternalCall:=23 vbglR0HGCMInternalDoCall=24 failed.=7 rc=-2=5
Dec=3 21=2 14:22:27=8 jserver-VirtualBox=17 kernel:=7 1=1 8=1 VBOXGUEST_IOCTL_HGCM_CALL:=26 64=2 Failed.=7 rc=-2.=6
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Initializing=12 cgroup=6 subsys=6 cpuset=6
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Initializing=12 cgroup=6 subsys=6 cpu=3
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Initializing=12 cgroup=6 subsys=6 cpuacct=7
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Linux=5 version=7 4.2.0-19-generic=16 (buildd@lgw01-31)=17 (gcc=4 version=7 5.2.1=5 20151010=8 (Ubuntu=7 5.2.1-22ubuntu2)=16 )=1 #23-Ubuntu=10 SMP=3 Wed=3 Nov=3 11=2 11:38:40=8 UTC=3 2015=4 (Ubuntu=7 4.2.0-19.23-generic=19 4.2.6)=6
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 KERNEL=6 supported=9 cpus:=5
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Intel=5 GenuineIntel=12
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 AMD=3 AuthenticAMD=12
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 NSC=3 Geode=5 by=2 NSC=3
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Cyrix=5 CyrixInstead=12
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Centaur=7 CentaurHauls=12
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Transmeta=9 GenuineTMx86=12
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Transmeta=9 TransmetaCPU=12
Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 UMC=3 UMC=3 UMC=3 UMC=3

# 4  
Old 12-24-2015
Any possibility that out of that latest sample log, you could manually, create an example of the output you would like to show and how it would have to be formatted?
Just looking at your lines of code, it is not clear what you want, since it is no even doing what you wish, at this moment. And your request specification is no help clarifying.
# 5  
Old 12-24-2015
Just a hunch, did you want something like this?

Column 2 has a consistent width of 4 characters.
Column 4 has a consistent width of 5 characters.

Code:
$ 
$ cat data.log
wtx srtv ty5$q uyrzx
p   56ye iouv  quxzo
ww  uytz rekj  ww#zz
$ 
$ 
$ awk '{
           for (i=1; i<=NF; i++) {
               x[i] = (NR == 1 || length($i) == x[i]) ? length($i) : -1
           }
       }
       END {
           for (i=1; i<=NF; i++) {
               if (x[i] != -1) {
                   print "Column ", i, " has ", x[i], " characters across ", NR, " lines"
               }
           }
       }' data.log
Column  2  has  4  characters across  3  lines
Column  4  has  5  characters across  3  lines
$ 
$

This User Gave Thanks to durden_tyler For This Post:
# 6  
Old 12-24-2015
so in the output i provided, there's about 6 columns that have the same number of characters across all 21 lines. and those 6 columns are (1, 2, 3, 4, 5, 6). After the 6th column of each line, you'll notice the numbers tend to be different across all/most lines.

basically, i just want to be able identify which columnS have the same number of characters across all lines read from a log.
# 7  
Old 12-24-2015
Quote:
Originally Posted by SkySmart
so in the output i provided, there's about 6 columns that have the same number of characters across all 21 lines. and those 6 columns are (1, 2, 3, 4, 5, 6). After the 6th column of each line, you'll notice the numbers tend to be different across all/most lines.

basically, i just want to be able identify which columnS have the same number of characters across all lines read from a log.
The awk script does identify the first six columns as per your requirement.

Code:
$ 
$ cat -n data.log
     1	Dec=3 21=2 11:13:59=8 jserver-VirtualBox=17 kernel:=7 1=1 VbglR0HGCMInternalCall:=23 vbglR0HGCMInternalDoCall=24 failed.=7 rc=-2=5
     2	Dec=3 21=2 11:13:59=8 jserver-VirtualBox=17 kernel:=7 1=1 VBOXGUEST_IOCTL_HGCM_CALL:=26 64=2 Failed.=7 rc=-2.=6
     3	Dec=3 21=2 12:09:28=8 jserver-VirtualBox=17 kernel:=7 1=1 VbglR0HGCMInternalCall:=23 vbglR0HGCMInternalDoCall=24 failed.=7 rc=-2=5
     4	Dec=3 21=2 12:09:28=8 jserver-VirtualBox=17 kernel:=7 1=1 VBOXGUEST_IOCTL_HGCM_CALL:=26 64=2 Failed.=7 rc=-2.=6
     5	Dec=3 21=2 12:39:44=8 jserver-VirtualBox=17 kernel:=7 1=1 8=1 VbglR0HGCMInternalCall:=23 vbglR0HGCMInternalDoCall=24 failed.=7 rc=-2=5
     6	Dec=3 21=2 12:39:44=8 jserver-VirtualBox=17 kernel:=7 1=1 8=1 VBOXGUEST_IOCTL_HGCM_CALL:=26 64=2 Failed.=7 rc=-2.=6
     7	Dec=3 21=2 14:22:27=8 jserver-VirtualBox=17 kernel:=7 1=1 VbglR0HGCMInternalCall:=23 vbglR0HGCMInternalDoCall=24 failed.=7 rc=-2=5
     8	Dec=3 21=2 14:22:27=8 jserver-VirtualBox=17 kernel:=7 1=1 8=1 VBOXGUEST_IOCTL_HGCM_CALL:=26 64=2 Failed.=7 rc=-2.=6
     9	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Initializing=12 cgroup=6 subsys=6 cpuset=6
    10	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Initializing=12 cgroup=6 subsys=6 cpu=3
    11	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Initializing=12 cgroup=6 subsys=6 cpuacct=7
    12	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Linux=5 version=7 4.2.0-19-generic=16 (buildd@lgw01-31)=17 (gcc=4 version=7 5.2.1=5 20151010=8 (Ubuntu=7 5.2.1-22ubuntu2)=16 )=1 #23-Ubuntu=10 SMP=3 Wed=3 Nov=3 11=2 11:38:40=8 UTC=3 2015=4 (Ubuntu=7 4.2.0-19.23-generic=19 4.2.6)=6
    13	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 KERNEL=6 supported=9 cpus:=5
    14	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Intel=5 GenuineIntel=12
    15	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 AMD=3 AuthenticAMD=12
    16	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 NSC=3 Geode=5 by=2 NSC=3
    17	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Cyrix=5 CyrixInstead=12
    18	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Centaur=7 CentaurHauls=12
    19	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Transmeta=9 GenuineTMx86=12
    20	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 Transmeta=9 TransmetaCPU=12
    21	Dec=3 22=2 10:00:46=8 jserver-VirtualBox=17 kernel:=7 [=1 0.000000]=9 UMC=3 UMC=3 UMC=3 UMC=3
$ 
$ 
$ awk '{
           for (i=1; i<=NF; i++) {
               x[i] = (NR == 1 || length($i) == x[i]) ? length($i) : -1
           }
       }
       END {
           for (i=1; i<=NF; i++) {
               if (x[i] != -1) {
                   print "Column ", i, " has ", x[i], " characters across ", NR, " lines"
               }
           }
       }' data.log
Column  1  has  5  characters across  21  lines
Column  2  has  4  characters across  21  lines
Column  3  has  10  characters across  21  lines
Column  4  has  21  characters across  21  lines
Column  5  has  9  characters across  21  lines
Column  6  has  3  characters across  21  lines
$ 
$

Is the output of the script as per your requirement?
Otherwise, did you want this information to be displayed in some other way?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Add string based on character length

Good day, I am a newbie here and thanks for accepting me I have a task to modify input data where my input data looks like 123|34567|CHINE 1|23|INDIA 34512|21|USA 104|901|INDIASee that my input has two columns with different character length but max length is 5 and minimum length is 0 which... (1 Reply)
Discussion started by: fastlearner
1 Replies

2. UNIX for Dummies Questions & Answers

Select lines based on character length

Hi, I've got a file like this: 22 22:35645163:T:<CN0>:0 0 35645163 T <CN0> 22 rs140738445:20902439:TTTTTTTG:T 0 20902439 T TTTTTTTG 22 rs149602065:40537763:TTTTTTG:T 0 40537763 T TTTTTTG 22 rs71670155:50538408:TTTTTTG:T 0 50538408 T TTTTTTG... (3 Replies)
Discussion started by: zajtat
3 Replies

3. Shell Programming and Scripting

Delimit file based on character length using awk

Hi, I need help with one problem, I came across recently. I have one input file which I need to delimit based on character length. $ cat Input.txt 12345sda231453 asd760kjol62569 sdasw4g76gdf57 And, There is one comma separated file which mentions "start of the field" and "length... (6 Replies)
Discussion started by: Prathmesh
6 Replies

4. Shell Programming and Scripting

awk command to find total number of Special character in a column

How to find total number of special character in a column? I am using awk -f "," '$col_number "*$" {print $col_number}' file.csv|wc -l but its not giving correct output. It's giving output as 1 even though i give no special character? Please use code tags next time for your code and... (4 Replies)
Discussion started by: AjitKumar
4 Replies

5. Shell Programming and Scripting

Need to extract data from Column having variable length column

Hi , I need to extract data from below mentioned data, having no delimiter and havin no fixed column length. For example: Member nbr Ref no date 10000 1000 10202012 200000 2000 11202012 Output: to update DB with memeber nbr on basis of ref no. ... (6 Replies)
Discussion started by: ns64110
6 Replies

6. Shell Programming and Scripting

Add character based on record length

All, I can't seem to find exactly what I'm looking for, and haven't had any luck patching things together. I need to look through a file, and if the record length is not 874, then add 'E' in position 778. Your help is greatly appreciated. (4 Replies)
Discussion started by: CutNPaste
4 Replies

7. Shell Programming and Scripting

Parsing 286 length Character string

Hi Friends, I have .txt file which has 13000 records. Each record is 278 character long. I am using below code to extract the string and it takes almost 10 minutes. Any suggestion please. cat filename.txt|while read line do f1=`echo $line|awk '{print substr($1,1,9)}'` f2=`echo... (6 Replies)
Discussion started by: ppat7046
6 Replies

8. Shell Programming and Scripting

How can use the perl or other command line to calculate the length of the character?

For example, if I have the file whose content are: >HWI-EAS382_30FC7AAXX:7:1:927:1368 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >HWI-EAS382_30FC7AAXX:7:1:924:1373 ACGAACTTTAAAGCACCTCTTGGCTCGTATGCCGTC I want my output calculate the total length of nucleotide. So my output should look like this:... (1 Reply)
Discussion started by: patrick chia
1 Replies

9. Shell Programming and Scripting

print a file with one column having fixed character length

Hi guys, I have tried to find a solution for this problem but couln't. If anyone of you have an Idea do help me. INPUT_FILE with three columns shown to be separated by - sign A5BNK723NVI - 1 - 294 A7QZM0VIT - 251 - 537 A7NU3411V - 245 - 527 I want an output file in which First column... (2 Replies)
Discussion started by: smriti_shridhar
2 Replies

10. Shell Programming and Scripting

Using Awk script to check length of a character

Hi All , I am trying to build a script using awk that checks columns of the înput file and displays message if the column length exceeds 35 char. i have tried the below code but it does not work properly (2 Replies)
Discussion started by: amit1_x
2 Replies
Login or Register to Ask a Question