[ILUG] Re: perl file processing

Marcus Furlong furlongm at hotmail.com
Thu Oct 23 23:20:00 IST 2008


Brian Foster <blf <at> utvinternet.ie> writes:

> 
>  below's a quickly-put-together all-awk(1) solution,
>  albeit if this was my problem I'd be more inclined
>  to do some filtering first, probably with sed(1)
>  like Francis did.
> cheers!
> 	-blf-
> 
> #!/bin/gawk -f
> BEGIN {
> 	state  = 0
> 	ncols  = 0
> 	nlines = 0
> 
> 	STDERR = "/dev/stderr"
> }
> 
> state == 0 && $0 == "=== Stratified cross-validation ===" {
> 	state = 1
> 	next
> }
> 
> state == 1 && $0 == "=== Detailed Accuracy By Class ===" {
> 	state = 2
> 	next
> }
> 
> state == 2 && 2 <= NF && $NF ~ /^[A-Z]$/ {
> 	for (n = 1; n < NF; n++) {
> 		if ($n !~ /^[0-9.]*$/)
> 			next
> 	}
> 	state = 3
> 	ncols = NF
> }
> 
> state == 3 && NF != ncols { exit }	# goto END
> 
> state == 3 && $NF !~ /^[A-Z]$/ { exit }	# goto END
> 
> state == 3 {
> 	for (n = 1; n < ncols; n++)
> 		col[n] += (0 + $n)
> 	nlines++
> 	next
> }
> 
> END {
> #debug	print "EXIT(" FNR "): nlines=" nlines, "ncols=" ncols
> 	if (nlines <= 0) {
> 		print FILENAME ": Data not found, state =", state  >STDERR
> 		exit 1
> 	}
> 	for (n = 1; n < ncols; n++)
> 		print col[n]/nlines
> }
> 

Just a follow-up question to this. I'm trying to use this script for a
variable-width NxN matrix of the following form:

=== Confusion Matrix ===

   a   b   c   d   e   f   g   h   i   j   k   <-- classified as
 154  12  28   7  17   1  10  11  56  20  30 |   a = A
   6 174  11   3   2   2   3   3  16   6  20 |   b = B
   8   7 222   4   7   0   9   6  24   9  34 |   c = D
   8   3  21 154  20   9  37   4  42  45  29 |   d = F
   8   0   8   4 277   1   7   2   9  15  11 |   e = G
   4   3  11   7  13 185  43   5  32  19  18 |   f = Q
   0   1   6  13  16   9 242   2   9  36  19 |   g = T
   9   9  29   3  14   3  17 139  59  25  40 |   h = U
  10   2   3   1   3   2   1   4 167   7  67 |   i = V
  20   5  17  15  16   9  30  13  38 149  38 |   j = X
   6   6   6   1   2   1   2   3  39  10 266 |   k = Z

Using the above script, how can I sum the diagonals? i.e. [0,0] to [N,N], [0,0]
being the top left corner.

Also, is it possible to pass the first string "=== Stratified cross-validation
===" in as a variable? I have multiple types of tests to retrieve from each
file, and hence multiple strings of text like the above to search for. Currently
I have just copied and pasted the script for each string I need to search for.




More information about the ILUG mailing list