[ILUG] [OT] perl and grep

Niall O Broin niall at linux.ie
Fri Mar 19 09:53:15 GMT 2004


On Friday 19 March 2004, Declan.Grady at nuvotem.com (Declan Grady) wrote:

>Hi,
>I've written a small perl proggy to split up ascii spoolfiles into separate 
>ascii files for archiving.
>e.g. a spoolfile containing say 20 order acknowledgements will be split up 
>into 20 separate ascii files, and then converted to pdfs, with a simple html 
>list linking to them.
>My next problem is that to get the list of spoolfile names, i was using grep, 
>since there are lots of files with the same extension...
>grep ACKNOWLEDGMENT spool*sdy | awk 'BEGIN {FS=":"}{print $1}' > filelist
>
>To have this run from within my perl script, do I just use system("grep ..);
>or is there a more clever 'perl way' to do this ?
>
>I thought of opening every sdy file and checking for the word ACKNOWLEDGMENT 
>in the specific line number where it would appear, but I think this would be 
>overkill - mabye not, as I guess grep would open every file anyway ?

Yes, you do need to open every file but it's not overkill - how else could you
examine their contents? But you definitely don't use system("grep ..) from
within perl - perl was designed as a text processing language, and a superset
of grep and awk. You could wrap your existing perl code in something like this

foreach $spool (@ARGV) {
    open SPOOL, "<$spool";
    # read entire spool file into one variable, in a block to localise $\
    {
        local $/; # undef the IRS to read entire file into one variable
        $_ = <SPOOL>;
    }
    if (/ACKNOWLEDGMENT/) { 
        # split into an array of lines
        @lines = split '\n';
        # your code goes here
    }
    close SPOOL;
}

Observe deliberate use of perl's $_ variable here, which is the default
argument for many perl functions. If I hadn't used $_ there, I'd have had to
create a temporary variable to hold the lines, and then search and split that
variable. There is definitely an argument that using this temporary variable
makes the code clearer, but IMO using $_ saves me from having to have an
otherwise useless temporary variable. It's completely readable to a perl
person anyway :-)


At the "# your code goes here"  you now have the array @lines, containing all
the lines of the spoolfile. It's time for your code to take over, and split
that up into separate files as you mentioned. Bear in mind that @lines
contains lines which don't have trailing \n (because the split removed that)
whereas if you had done something like

@lines = <SPOOL>

the lines WOULD have trailing \n

You would then run your program like

split_spools  spool*sdy




Niall




More information about the ILUG mailing list