[CLUG] Script : split a string

Peter Flynn peter at silmaril.ie
Mon Jan 12 22:05:35 GMT 2009


Jean-Pierre Thibert wrote:
> Hello there,
> 
> I'm still working not learning script.
> My problem is:
> 
> I get a variable from wget like
> DATA="Eddy,Murphy,Cork,125,95.41"
> String with comma separator

In a bash shell script, awk is probably best here, as you have 
discovered. Perl/Tcl/etc all have equivalent ways of doing this.

> I need to split every fields like
> FIRST_NAME="Eddy"
> NAME="Murphy"
> CITY="Cork"
> RUN=125
> AVERAGE=95.41

Where do the field names come from? Are they available in the file you 
downloaded (eg like the first line of a CSV file?) or did you invent them?

> What I learned so far let me think that script is file oriented, but I 
> would like to use memory variables instead.
> I tried to use awk and I wrote
> 
> awk -v INPUT=$DATA 'BEGIN { n = split(INPUT, parts, ",")}'

BEGIN is usually only used for setting-up an awk program.
The normal technique is to use the echo command to create a one-line 
"file" which you pipe into awk, eg

FIRST_NAME=`echo $DATA | awk -F, '{print $1}'`
NAME=`echo $DATA | awk -F, '{print $2}'`
CITY=`echo $DATA | awk -F, '{print $3}'`
RUN=`echo $DATA | awk -F, '{print $4}'`
AVERAGE=`echo $DATA | awk -F, '{print $5}'`

> As you can see INPUT passe the string into the "function" by value (I
> don't know if it's the good english expression). But is it a real 
> function? parts is an array with the differents fields. The function
> works fine. print parts[1] works fine inside the function. My problem
> is I can't find out how to get the array out of the function.

If you want it all in one pass, the get awk to compose the commands, and 
"execute" them by evaluating the expression:

eval `echo $DATA |\
   awk -F, 'BEGIN {n=split("FIRST_NAME,NAME,CITY,RUN,AVERAGE",f)} \
                  {for(i=1;i<=n;++i)print f[i] "=\"" $i "\""}'`

Even better if the field names are on the first record; then you can 
process the data as a file, eg

FIRST_NAME,NAME,CITY,RUN,AVERAGE
Eddy,Murphy,Cork,125,95.41
Peter,Flynn,Cork,119,85.32
Jean-Pierre,Thibert,France,135,89.46

Then you can create a series of commands to do something with the data:

cat data | awk -F, '{if(NR==1)n=split($0,f);else{for(i=1;i<=n;++i)print 
f[i] "=\"" $i "\"";print "bash analyse.sh"}}'

> By the way, what is the usual way to write the variables, functions 
> etc...  names? Capital? Lowercase? other?

Unix users conventionally use lowercase for everything.
Mainframe users use UPPERCASE for everything.
VAX users separate tokens with a Dollar$Sign.
Java users use camelCase with caps in the middle.
Windows users insert spaces, and then wonder why it doesn't work :-)

///Peter




More information about the Cork mailing list