Perl

Perl was actually my last programming language I learned but keeps me suprised in the possibilities for writing code for small problems in short time. I used to think that Perl is good for cgi's and for programmers who have to learn a script programming language a long time ago. But that is not the case!

Everyone who thinks that should not try to learn Perl by looking at some source code samples. I was a bit confused because of short examples that do everything but explain me how. And so I have to read a beginner book: Introduction to Perl. It is amazing how short you can write programs that changing hundreds of files in seconds as the first example I show you below the links.

External Links


Internal Links


Changing hundreds of text files in about a second

Given the case that you want to change one of your source code projects. There are hundreds of sourcecode files with a header as the following:

 /************************/
 /* author: me           */
 /* email:  me@nospam.no */
 /* date:                */
 /************************/

To change the date you can:

 chomp(my $date = `date`);
 $^I = ".bak";
 while(<>) {
   s#date:.*#date: $date#;
   print;
 }

OR you can such things without writing a program. Just give a command on shell:

 perl -p -i.bak -n -e 's#^/* email:.*#/* email: new@new.new    */#' *.c 
 -p assumes while(<>) {...} around file content and prints current line. 
 -n assumes while(<>) {...} around file content but does not print current line. 
the -i options is as $^I in source code. Keep it empty if you do not want a backup file
  -e program        one line of program (several -e's allowed, omit programfile)
  -i[extension]     edit <> files in place (makes backup if extension supplied)
  -n                assume "while (<>) { ... }" loop around program
  -p                assume loop like -n but print line also, like sed

word counter

To count the words in a text you can use this very short code:

 while(<>) {
  foreach(split) {
	$overall++;
	$wordtable\{$_\}++;
  }
 }
 foreach(%wordtable) {
   print "$_\n";
 }

Useful CPAN tools

unicode utf-8

 # ein ü
 my $s = "\x{00FC}"; 
 # euro character windows-1252

CGI script with euro sign:

 use CGI qw(:all);
 print header(-type 0> 'text/html', -charset => 'iso-8859-1');
 print chr(0x80);
 # or as iso-8859-15
 print header(-type 0> 'text/html', -charset => 'iso-8859-15');
 print chr(0xA4);
 # or as utf-8
 print header(-type 0> 'text/html', -charset => 'utf-8');
 print chr(\x{20AC});

debuging with the help of tokens

 print "line: " . __LINE__;
 print "file: " . __FILE__;
 print "package: " . __PACKAGE__;

open a file

 open(FILEHANDLER, "<", $fullpath)
        or die qq/cannot read "$fullpath" with read access: $!\n/;
 @filecontent = <FILEHANDLER>;
 close(FILEHANDLER);
 print @filecontent;

strings

 $char = chr(0x394);
 $code = ord($char);
 printf "Zeichen d, %#04x\n", $char,$code,$code;
 # ergibt ausgabe: Zeichen "delta" hat den Code 916, 0x394
 print "Zeichen \xC4 und \x{0394} sehen ...\n";
 $string = "wir laufen beim lousberlauf mit\n";
 substr($string, 4, 6) = "rennen"; # laufen durch rennen ersetzen
 print $string;

 $a = "To be or not to be";
 $b = unpack("x6 A6", $a); # ueberspringen, 6 einlesen
 print " $b \n"; # gibt "or not"

escapes for strings in double quotation

 \n  newline
 \r  carriage return
 \t  tab
 \f  page forward
 \b  backspace
 \a  beep
 \e  ascii escape character
 \007 octal ascii value 007
 \x7f hex value 7f=del
 \cC control character ctrl-c
 \\ backslash
 \" double quotation
 \1 following letter small
 \l all following letter small
 \L all letters small until \E
 \u all letters big
 \U all letters big until \E
 \Q protect all not-alphanumerical characters until \E with a backslash
 \E  end of sequence \L, \U ...

operations on strings

 "Fred" x 3 => "FredFredFred"
 5 x 4 => "5555" 
because operator "x" is left ascociated and so five will be converted into a string.

implemented warnings

 use warnings;
 use diagnostics;
or perl -Mdiagnostics ./my_prg
 use strict;

print

 print "Hallo Welt\n";
 print "The answer is ";
 print 6*7;
 print ".\n";
 print "The result is $result.\n";
 print 'The result is ' . $result . '\n';

comparison

numbers

 ==, !=, <, >, <=, >=

strings

 eg, ne, lt, gt, le, ge

input

To get rid of the newline character after asking the user for input use chomp-operator. This operator just deletes the first newline which occurs in a string.

 chomp($text=<STDIN>);
 print $text;

undef and defined

A variable has the the value undef before its first initialization. To test if the variable has the value 'undef' you can use defined operator.

 $input = undef;
 if( defined($input) )
   print" undef \n";

Multimedia Email

 use MIME::Lite;
 my $ifcfg = `ifconfig -a`
 my $mail=MIME::Lite->new(
     From =>'root@asdf', 
     To=>'asdf@asdf.de', 
     Subject=>'Re', 
     Type=>'multipart/mixed', );
 $mail->attach(Type =>'TEXT',Data=>"IP config:\n $ifcfg");
 $mail->attach(
    Type=>'AUTO', 
    Path=>''user.jpg', 
    Disposition =>'attachment', 
    Filename => "weather.$^T.jpg");
 $mail->send;

Numbers

 foreach my $i ($X .. $Y) {...}
 for (my $i=$X; $i<=$Y; $i++) {...}

 1.25
 255.000
 255.0
 7.25e45 # 7.25 times 10 to the power of 45
 -6.5e24
 -12e-24
 -1.2E-23 # the big letter E is also correct

 0
 2001
 -40
integers will be represent internal as double precision floating point numbers but this is transparent to the programmer.
 45_333_1234_01010
you could also split the number with a bottom-dash to make the number better readable.
 0377 # its an octal 377 which is 255 in decimal system
 0xff # 255 as headecimal number
 0b11111111 # 255 as binary number
 0xC0_FF_3A_00_AA # also here you can make it more readable

Numeric Operators

 2 + 3
 5.2-5.31415
 10 % 3 # 10 modulo 3
 2**3   # is 2 to the power of 3 which is 8

random numbers

random numbers in the interval X<= rand <=Y:

 $random = int( rand( (Y-X+1) ) + X ;

if you need random numbers but each execution the same sequence use srand:

 srand SEED;

or better random numbers:

 use Math::TrulyRandom;
 $random = truly_random_value();

 use Math::Random;
 $random = random_uniform();

trigonometric functions

 use Math::Trig;
 $y = acos(3.7);
 # sin, cos, atan2 and complex numbers are also possible.

logarithm

 $log_e = log(VALUE);

 use POSIX qw(log10);
 $log_10 = log10(VALUE);
 $log_n(x) = log_e(x) / log_e(n)

PDL - Perl Data Language

 use PDL;
 # a and b are complex matrices
 $c=$a x $b;

complex numbers

 use Math:Complex;
 $a = Math::Complex->new(3,5);
 $b = Math::Complex->new(2,-2);
 $c = $a * $b;

hex numbers

 $number = hex(2e); # result is 46

big numbers in perl

 use Math::BigInt
 Math::BigInt->new(10293480192340918230948);

Date and Time

 ($day, $month, $year) = (localtime)[3,4,5];

or

 use Time::localtime;
 $tm = localtime;
 ($day, $month, $year) = ($tm->mday, $tm->mon, $tm->year);

Print date in ISO-8601 format:

 use Time::localtime;
 $tm = localtime;
 printf("date is: %04d-%02d-%02d\n", $tm->year+1900, ($tm->mon)+1, $tm->mday);

To get the time in seconds since 1.1.1970:

 use Time::Local;
 $TIME = timelocal($sec, $min, $hours, $mday, $mon, $year);
 $TIME = timegm($sec, $min, $hours, $mday, $mon, $year);

The other way from seconds since 1.1.1970 to the normal format:

 ($seconds, $minutes, $hours, $day_of_month, $month, $year, $wday, $yday, $isdst) = localtime($time);

or

 use Time::localtime; # or Time::gmtime if value is utc format
 $tm = localtime($time);  # or gmtime($time)
 $seconds = $tm->sec;
 # and so on ...

If you want to calculate with time points or dates use the seconds format:

 $when = $now + $difference;
 $then = $now - $difference;

or use CPAN module Date::Calc

 use Date::Calc qw(Add_Delta_Days);
 ($y2, $m2, $d2) = Add_Delta_Days($y, $m, $d, $offset);
 # use Add_Delta_DHMS for a more precise calculation

difference between two dates:

 use Date::Calc qw(Delta_DHMS);
 ($days, $hours, $minutes, $seconds) = 
   Delta_DHMS(
     $y1,$m1,$d1,$h1,$M1,$s1,
     $y2,$m2,$d2,$h2,$M2,$s2);

readable output of date:

 use POSIX qw(strftime);
 $string = strftime($format, 
     $seconds, 
     $minutes, 
     $hour, 
     $day_of_month, 
     $month, 
     $year, 
     $weekday, 
     $yearday, 
     $dst);

high precise measurements:

 use Time::HiRes qw(gettimeofday);
 $t0 = gettimeofday();
 $t1 = gettimeofday();
 $elapsed = $t1-$t0;
 # $elapsed is a floating point number

sleep function with a period beneath a second:

 use Time::HiRes qw(sleep);
 sleep($time_to_sleep);

Arrays

 @a = ("quick", "brown", "fox");
 @a = qw(If the array elements are only words)
 @lines = (<< "END_OF_TEXT" =~ /^\s*(.+)/gm);
    first line of the text
    second line
    and so on
 END_OF_TEXT

 $arr[0] = 1;
 $arr[1] = 1;
 $arr[2] = 1;
 $arr[99] = 1; # results in filling the array with zeros between 3 and 99.
 print $#arr; # gives us 100

 (1,2,3)
 ("fred", 2.5)
 (1..100)
 (5..1) # results in an empty list because order is always up

To give you a short solution for string lists use 'qw' what stands for 'quoted words':

 qw( Fred Barney Betty Wilma Dino )

same as

 ("Fred", "Barney", "Betty", Wilma", "Dino")

qw cause the deletion of all spaces as newline tabs and

The prefix of whitespaces will be deleted by \s in the regular expression.

 $#ARRAY 
represent the index of the last element in the array.
@ARRAY is the size of the array. We can increase the size of the array by one with:
 $#ARRAY = @ARRAY;

If you have arrays with huge space between entries you should better use an hash table.

 $real_array[1000000] = 4711; # use four megabytes of space
 $fake_array{1000000} = 4711; # use only the space for key and value

Disadvantage is the order of the entries. If you have an array you print out the elements in order of the index:

 foreach $element(@real_array) {
   print ...
 }

reverse order:

 foreach $idx ( 0 .. $#real_array ) 
 {
    print $real_array[$#real_array - $idx]
 }

in order output of an hash table:

 foreach $element ( @fake_array{ sort {$a <=> $b] keys %fake_array } ) {
   print $element
 }

reverse order:

 foreach $element ( sort {$a <=> $b] keys %fake_array ) {
   print $element
 }

internal order of the hash table:

 foreach $element (values %fake_array ) {}

sort elements:

 foreach $var ( sort keys %ARRAY ) {}

use entries of an array from which you only have a reference

 foreach $item ( @$ARRRAYREF) {}
 for($i=0;$i<=$#$ARRAYREF; $i++) {}

normalize arrays

If you want that every entry occurs only once you have to delete the double pairs after building the array.

With help of the hash table seen:

 %seen = ();
 @uniq = ();
 foreach $item ( @list) { 
    unless ($seen) {
      $seen = 1;
      push(@uniq, $item);
    }
 }

or

 %seen = ();
 @uniq = grep { ! $seen ++ } @list;

concatenate two arrays

 push(@ARRAY1, @ARRAY2);

reverse arrays

 @REVERSED = reverse @ARRAY;

or

 foreach $element (reverse @ARRAY) {
     # do something with $element
 }

process multiple elements with the function splice

 # remove $N elements from the array
 # from the beginning
 @FRONT = splice(@ARRAY, 0, $N);
 # from the end
 @END = splice(@ARRAY, -$N);

search for the first element with a distinct behavior

 my ($match, $found, $item);
 foreach $item(@array) {
    if(TEST) {
      $match = $item; # store element
      $found = 1;
      last;
    }
 }
 if( $found) {
 #foo
 }
 else#
 {
 # bar
 }

get these elements with a distinct behaviour

 @matching = grep { TEST ($_) } @list;

grep is a short form of:

 @matching = ();
 foreach(@list) {
   push(@matching, $_) if TEST ($_);
 }

sort a list of numbers

The perl function sort sorts in ascii order.

 @sorted = sort { $a <=> $b } @unsorted;

<=> is a numerical comparison operator. It sorts numbers in ascending order. By default "sort" uses the function cmp.

Hashes

Arrays use integer to index its elements. Hashes use always strings. Add an element with:

 $HASH{ $KEY } = $value;
 foreach $e (keys %hashtable) {
   print $e;
 }
 while( ($key, $value) = each(%HASH) ) {
     # do something
 }

To use the keys in order of insertion:

 use Tie::IxHash;
 tie %HASH, "Tie::IxHash";
 # do something with %HASH
 @keys = keys %HASH;

different types of initialization:

 %token = ("if", 23, "while", 42, "for", 4711);

or

 $token{"if"} = 23;
 $token{"while"} = 42;
 $token{"for"} = 4711;

or

 %token = ("if" => 23, "while" => 42, "for" => 4711);

If you do not want to know about the content of an entry but if a given key exists in the table: use exists

 if(exists($HASH)) { #bla 
 }

If you want to lock the hash

 use Hash::Util qw{ 
   lock_keys unlock_keys 
   lock_value unlock_value 
   lock_hash unlock_hash 
 };

If you want to delete an entry:

 delete($HASH);
 delete @hash{"a", "b", "c"};

If you want to put more than one value to a specific key, create a array reference in $hash.

If you have a hash and its value and want to know which key it has:

 LOOKUP;

sort keys:

 @keys = sort {criterion() } (keys %hash);

merge to hashes:

 A, %B);

init hash table with n elements:

 keys(%hash) = $n;

count the elements:

 %count();
 foreach $element ( @ARRAY ) {
    $count++;
 }

realize a graph relation as hash table:

 %father = ("S" => "ab", "A" => "ab", "C" => "ab");

pattern recognition

 $txt =~ m/pattern/; # true if txt contains pattern
 $txt !~ m/pattern/; # true if txt does not contain pattern
 $txt =~ s/old/new/; # replaces string old with string new in string txt

In the pattern you can use \b which means boundary of a word.

 $txt =~ m/\bavailable/; # finds available but not unavailable
 $txt =~ m/dig/i    # find Dig and dig because i means case insensitive

 /i case insensitive
 /x ignores whitespaces
 /g global substitution not only once per line
 /gc do not reset search position after failed match
 /s recognize also linefeed
 /m  for ^ and $
 /o  compiles pattern only once
 /e   right part of a substitution command is a code 
      which has a result value which should be used as replace pattern.
 /ee  

search and replace

Instead of

 $dst = $src;
 $dst =~ s/pattern/replace/;

use this

 ($dst = $src) =~ s/this/that/;

 ($progname = $0) =~ s!^.*/!!;
use basename of program. Here is the delimiter not / but ! and it means search for occurence of all but . and at the end one /. This will be replaced with nothing.
 $capitalword = $word) =~ s/(\w+)/\u\L$1/g
first letter turns to capital letter the rest of the word in lower letter
 \l (kleines L)
Nächstes Zeichen klein
 \u
Nächstes Zeichen groß
 \L
Alles klein bis \E
 \U
Alles groß bis \E
 \E
Endkennzeichnung
 (?=\w)
look ahead
 /@(?=\w+\b)/
after the @ are only letters and word bound allowed
 /(?<=\b\w+)@/
before the @ are only word bound and then letters allowed

patterns with only letters

Simple solution:

 /^[A-Za-z]+$/
but not enough because of the not ascii character like öüä etc.

The best way to do this is the use of unicode properties:

 /^\p{Alphabetic}+$/

or shorter

 /^\pL+$/

 \p{property}/
pattern has this property
 \P{property}/
pattern has not this property

how often does a pattern occur

 $count = 0;
 while(/(\w+)\s+pattern\b/gi {
count++;
 }
 print("%d", count);

n-th occurence of pattern

 /(?:pattern){n-1}pattern/i

matching across multiple lines

 /foo.*bar/s
finds foo in a line and bar in the following line.
 /^begin/m
finds pattern even after a newline \n

read of records which are devided by a separator

 undef $/; # $/ is the default separator
 @chunks = split(/pattern/, <FILEHANDLE>);
by undefing $/ we read the whole file and split it afterwards with function split.

extract part of a text string

 while(<>) {
if( /STARTPATTERN/ .. /END/) {
     # works between start and end with pattern lines.
   }
 }
 while(<>) {
   if( /STARTPATTERN/ ... /END/) {
     # works between start and end without pattern lines.
   }
 }

Fuzzy Matching

are useful if you want to match something what approximately match a pattern.

 use String::Approx qw(amatch);
 if(amatch("MUSTER", @list)) {
   #recognized
 }
 @matches = amatch("MUSTER", @list);
by default that finds all patterns which are below 10 percent difference to the correct match.

matching of a valid email address

use CPAN module Email::Valid

fork()

fork() is as always a function to split the root process into 2 separate processes.

 defined(my $pid=fork) or die "error in execution of fork: $!\n";
 unless( $pid ) {
    child_code();
 }
 else {
    parent_code();
 }
 code_for_both_child_AND_parent();

For frequent misunderstanding here the return values of fork():

File Access

DBM files as database

 use DB_File;
 tie(%db, 'DB_File', '/tmp/asdf.db');
 ... # use %db as normal hash table
 untie %db;

File Test Operations

 die "file $filename already exists.\n" if -e $filename;

 warn "The file is more than 28 days old.\n" if -M $filename > 28;

 -r readable
 -w writeable
 -x executeable
 -o owner 
 -R readable for specific group
 -W 
 -X
 -O
 -e exist
 -s exists and has size retvalue
 -f file
 -d directory
 -l link
 -S socket
 -p pipe or fifo
 -b blockdevice
 -c character device
 -u setuid
 -g setgid
 -k sticky bit
 -t TTY
 -T probably text file
 -B probably binary file
 -M last modify in days
 -A last access in days
 -C last change des Inode in days

more file information with functions stat and lstat

localtime

 my ($sec, $min, $hour, $day, $month, $year, $weekday, $day_in_month) = localtime;

or as GMT format:

 my $now = gmtime;

navigate through file system

 chdir "/etc" or die "cannot change working directory\n"

work file system recursively

Use the module File::Find.

remove files from file system

 unlink "filename1.txt", $filename2;
 unlink glob "*.o";

rename files

 rename "oldname", "newname";

create a new link to a file

 link "existingfile", "linkname";

creation of directories

 mkdir "fred", 0755
 mkdir "asdf, oct($rights);

remove directory

 unlink glob "dir/*";
 rmdir "dir";

change userrights

 chmod 0755, "asdf", "qwer";

change owner

 chown $user, $group, glob "*.o";

sorting hashes

 my %results = ("asdf" => 195, "qwer" => 202);
 my @table = sort sub_sort keys %results;
 sub sub_sort { $results <=> $results }

enviroment variables

 $ENV{'PATH'} = "/home/asdf:$ENV{'PATH'}";
 delete $ENV{'IFS'};
 my $make_result = system "make";

backquotes

 my $now = `date`;
 print "it is $now.\n";

or as line separated array:

 my @user = `who`;

process as file handle

 open DATE, "date|";
 my $now = <DATE>;
 close DATE;

fork

 defined(my $pid = fork) or die "no fork possible: $!\n";
 unless( $pid ) {
   # child process
 }
 # parent process
 waitpid($pid, 0);

send signals to processes

 kill 2, 4201 or die "no SIGINT could be sent to process 4201: $!\n";

catch signals

 sub sig_int_handler {
   # do some work for clean exit
 }

 $SIG{'INT'} = 'sig_int_handler';