Coding Bash vs. Perl vs. Python (by Evan Liu)


This website shows how to accomplish the same tasks using different scripting languages. I did it while learning bash, perl and python in 2013. My facebook account: liuenmao@hotmail.com.
The website is based on Python and Django.

Topics about regex

Regex: Greedy matching: *, +, ?

In Bash, egrep,awk and sed can be used for regular expression matching, all of them support [[:digit:]], [[:space:]], \s, \t.
However only egrep supports \d, like in perl

In python, regular expression operations are handled by module re. below is a simple summary.
re.split()  --> splits to array with pattern as delimeter
re.match()  --> matchs the start of the string,and return match object
re.search()  --> matchs anywhere in the string, only finds the first occurrence,  return the match object
re.findall()  --> returns a list of all non-overlapping matches in the string

Tasks:
1.Define a variable holding a multiline String
2.OR condition example in regex
3.AND condition example in regex
4.NOT condition example in regex
		  
Bash Perl Python
  1. #1.Define a variable holding a multiline String
  2. read -r -d '' poem <<EOD
  3. Lily-like, white as snow,
  4. she hardly knew
  5. she was a woman, so
  6. sweetly she grew.
  7. EOD
  8. #2.OR condition example in regex
  9. # use egrep, awk, sed respectively
  10. echo "$poem" |egrep '^Lily|woman'
  11. echo "$poem" |awk '/^Lily|woman/'
  12. echo "$poem" |sed -n '/^Lily\|woman/p'
  13. echo "$poem" |sed -e '/^Lily/b' -e '/woman/b' -e d
  14. #3.AND condition example in regex
  15. # use egrep, awk, sed respectively
  16. echo "$poem" |egrep '^she.*woman'
  17. echo "$poem" |awk '/^she.*woman/'
  18. echo "$poem" |awk '/^she/ && /woman/'
  19. echo "$poem" |sed '/^she/!d; /woman/!d'
  20. #4.NOT condition example in regex
  21. # use egrep, awk, sed respectively
  22. echo "$poem" |grep -v 'she'
  23. echo "$poem" |awk '!/she/'
  24. echo "$poem" |sed -n '/she/!p'
  25. echo "$poem" |sed '/she/d'
  26. -------output----------
  1. #1.Define a variable holding a multiline String
  2. #2.OR condition example in regex
  3. # use egrep, awk, sed respectively
  4. Lily-like, white as snow,
  5. she was a woman, so
  6. Lily-like, white as snow,
  7. she was a woman, so
  8. Lily-like, white as snow,
  9. she was a woman, so
  10. Lily-like, white as snow,
  11. she was a woman, so
  12. #3.AND condition example in regex
  13. # use egrep, awk, sed respectively
  14. she was a woman, so
  15. she was a woman, so
  16. she was a woman, so
  17. she was a woman, so
  18. #4.NOT condition example in regex
  19. # use egrep, awk, sed respectively
  20. Lily-like, white as snow,
  21. Lily-like, white as snow,
  22. Lily-like, white as snow,
  23. Lily-like, white as snow,
  1. #1.Define a variable holding a multiline String
  2. my $poem = <<"EOD";
  3. Lily-like, white as snow,
  4. she hardly knew
  5. she was a woman, so
  6. sweetly she grew.
  7. EOD
  8. my @poem = split(/\n/,$poem);
  9. #2.OR condition example in regex
  10. @result = grep { $_ =~ /^Lily|woman/ } @poem;
  11. print "$_\n" foreach (@result);
  12. #3.AND condition example in regex
  13. @result = grep { $_ =~ /^she.*woman/ } @poem;
  14. print "$_\n" foreach (@result);
  15. #4.NOT condition example in regex
  16. @result = grep { !/she/ } @poem;
  17. print "$_\n" foreach (@result);
  18. @result = grep ( !/she/ , @poem );
  19. print "$_\n" foreach (@result);
  20. foreach (@poem) { print "$_\n" unless /she/;}
  21. -------output----------
  1. #1.Define a variable holding a multiline String
  2. #2.OR condition example in regex
  3. Lily-like, white as snow,
  4. she was a woman, so
  5. #3.AND condition example in regex
  6. she was a woman, so
  7. #4.NOT condition example in regex
  8. Lily-like, white as snow,
  9. Lily-like, white as snow,
  10. Lily-like, white as snow,
  1. import re
  2. #1.Define a variable holding a multiline String
  3. poem='''Lily-like, white as snow,
  4. she hardly knew
  5. she was a woman, so
  6. sweetly she grew.'''
  7. #2.OR condition example in regex
  8. print re.findall(r'^Lily.*|.*woman.*', poem, flags=re.M)
  9. #3.AND condition example in regex
  10. print re.findall(r'^she.*woman.*', poem, flags=re.M)
  11. #4.NOT condition example in regex
  12. print [x for x in poem.split('\n') if not 'she' in x]
  13. -------output----------
  1. #1.Define a variable holding a multiline String
  2. #2.OR condition example in regex
  3. ['Lily-like, white as snow,', 'she was a woman, so']
  4. #3.AND condition example in regex
  5. ['she was a woman, so']
  6. #4.NOT condition example in regex
  7. ['Lily-like, white as snow,']

Regex: Lazy matching: *?, +?, ??

As a general principle in regex, ?, * , + and {n,m} will match as much of the string as possible while still allowing the whole regexp to match.

In bash, 'grep -P'will make grep use perl syntax 

In python
The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible
The '*?', '+?', and '??' make it match, as few characters as possible, aka lazy mode or minimal mode.

Tasks:
1.Define a variable for matching
2.Perform a greedy match
3.Perform a non-greedy match
		  
Bash Perl Python
  1. #1.Define a variable for matching
  2. h="<html><head>Neutrino</head></html>"
  3. #2.Perform a greedy match
  4. egrep -o "<h.*>" <<<"$h"
  5. # or use Perl syntax with '-P'
  6. grep -Po "<h.*>" <<<"$h"
  7. #3.Perform a non-greedy match
  8. grep -Po "<h.*?>" <<<"$h"
  9. -------output----------
  1. #1.Define a variable for matching
  2. #2.Perform a greedy match
  3. <html><head>Neutrino</head></html>
  4. # or use Perl syntax with '-P'
  5. <html><head>Neutrino</head></html>
  6. #3.Perform a non-greedy match
  7. <html>
  8. <head>
  1. #1.Define a variable for matching
  2. $h="<html><head>Neutrino</head></html>";
  3. #2.Perform a greedy match
  4. $h=~m/(<h.*>)/; print "$1\n";
  5. #3.Perform a non-greedy match
  6. $h=~m/(<h.*?>)/; print "$1\n";
  7. -------output----------
  1. #1.Define a variable for matching
  2. #2.Perform a greedy match
  3. <html><head>Neutrino</head></html>
  4. #3.Perform a non-greedy match
  5. <html>
  1. import re
  2. #1.Define a variable for matching
  3. h="<html><head>Neutrino</head></html>"
  4. #2.Perform a greedy match
  5. print re.search(r'<h.*>',h).group()
  6. #3.Perform a non-greedy match
  7. print re.search(r'<h.*?>',h).group()
  8. -------output----------
  1. #1.Define a variable for matching
  2. #2.Perform a greedy match
  3. <html><head>Neutrino</head></html>
  4. #3.Perform a non-greedy match
  5. <html>

Regex: Substitution


		  
		  

Tasks:
1.Define a variable holding a multiline String
2.replace 'she' with 'She'
		  
Bash Perl Python
  1. #1.Define a variable holding a multiline String
  2. poem=$(cat <<EOD
  3. Lily-like, white as snow,
  4. she hardly knew
  5. she was a woman, so
  6. sweetly she grew.
  7. EOD
  8. )
  9. #2.replace 'she' with 'She'
  10. echo "$poem" |sed 's/^she/She/g'
  11. # or
  12. echo "${poem//she/She}"
  13. -------output----------
  1. #1.Define a variable holding a multiline String
  2. #2.replace 'she' with 'She'
  3. Lily-like, white as snow,
  4. She hardly knew
  5. She was a woman, so
  6. sweetly she grew.
  7. # or
  8. Lily-like, white as snow,
  9. She hardly knew
  10. She was a woman, so
  11. sweetly She grew.
  1. #1.Define a variable holding a multiline String
  2. my $poem = <<"EOD";
  3. Lily-like, white as snow,
  4. she hardly knew
  5. she was a woman, so
  6. sweetly she grew.
  7. EOD
  8. #2.replace 'she' with 'She'
  9. my @poem = split(/\n/,$poem);
  10. foreach (@poem){
  11. s/^she/She/;
  12. }
  13. print "$_\n" foreach (@poem);
  14. -------output----------
  1. #1.Define a variable holding a multiline String
  2. #2.replace 'she' with 'She'
  3. Lily-like, white as snow,
  4. She hardly knew
  5. She was a woman, so
  6. sweetly she grew.
  1. #1.Define a variable holding a multiline String
  2. import re
  3. poem='''Lily-like, white as snow,
  4. she hardly knew
  5. she was a woman, so
  6. sweetly she grew.'''
  7. #2.replace 'she' with 'She'
  8. print re.sub(r'^she', r'She', poem, flags=re.M)
  9. -------output----------
  1. #1.Define a variable holding a multiline String
  2. #2.replace 'she' with 'She'
  3. Lily-like, white as snow,
  4. She hardly knew
  5. She was a woman, so
  6. sweetly she grew.

Regex: Parentheses capture: ()

In Bash, command sed supports capture variables \1, \2, \3 by using parentheses
In Perl, capture variables $1,$2, $3 are used to retrieve values in the parentheses
In Python, one additional feature for capturing values are named group: the format for named regular expression group is "(?P<group_name>regexp)", P means placeholder

Tasks:
1.Define a string for capturing
2.Capture values by parentheses
		  
Bash Perl Python
  1. #1.Define a string for capturing
  2. line1="Alpha Centauri-4.3650-1689"
  3. #2.Capture values by parentheses
  4. # use sed
  5. sed -r 's/(.*)-(.*)-(.*)/star->\1 distance->\2 year->\3/' <<<$line1
  6. # or use awk
  7. awk -F'-' '{print "star->"$1" distance->"$2" year->"$3 }' <<<$line1
  8. -------output----------
  1. #1.Define a string for capturing
  2. #2.Capture values by parentheses
  3. # use sed
  4. star->Alpha Centauri distance->4.3650 year->1689
  5. # or use awk
  6. star->Alpha Centauri distance->4.3650 year->1689
  1. #1.Define a string for capturing
  2. $line1="Alpha Centauri-4.3650-1689";
  3. #2.Capture values by parentheses
  4. $line1 =~ /(.*)-(.*)-(.*)/;
  5. print "star->$1; distance->$2; year->$3\n";
  6. ($star, $distance, $year) = $line1 =~ /(.*)-(.*)-(.*)/;
  7. print "start->$star; distance->$distance; year->$year";
  8. -------output----------
  1. #1.Define a string for capturing
  2. #2.Capture values by parentheses
  3. star->Alpha Centauri; distance->4.3650; year->1689
  4. start->Alpha Centauri; distance->4.3650; year->1689
  1. import re
  2. #1.Define a string for capturing
  3. line1="Alpha Centauri-4.3650-1689"
  4. #2.Capture values by parentheses
  5. # note: group(0) returns the entire match
  6. match=re.search(r'(.*)-(.*)-(.*)', line1)
  7. print "start->%s; distance->%s; year->%s" %(match.group(1), match.group(2), match.group(3))
  8. # or use re.findall
  9. print re.findall(r'(.*)-(.*)-(.*)', line1)
  10. print re.findall(r'([^-]+?)(?=$|-)', line1)
  11. -------output----------
  1. #1.Define a string for capturing
  2. #2.Capture values by parentheses
  3. # note: group(0) returns the entire match
  4. start->Alpha Centauri; distance->4.3650; year->1689
  5. # or use re.findall
  6. [('Alpha Centauri', '4.3650', '1689')]
  7. ['Alpha Centauri', '4.3650', '1689']

Regex: Parentheses capture with named group: (?P<>)

In Bash, if regex match operator =~ is used, the results of the match are saved to array ${BASH_REMATCH[@]}
In Perl, the format for named capture is (?<group_name>regexp), then use the %+ hash to retrieve them.
In Python, the format for named group is "(?P<group_name>regexp)", P means placeholder

Tasks:
1.Define a string for capturing
2.Capture values by parentheses
		  
Bash Perl Python
  1. #1.Define a string for capturing
  2. line1="Alpha Centauri-4.3650-1689"
  3. #2.Capture values by parentheses
  4. [[ "$line1" =~ (.*)-(.*)-(.*) ]]
  5. echo "star->${BASH_REMATCH[1]}; distance->${BASH_REMATCH[2]}; year->${BASH_REMATCH[3]}"
  6. -------output----------
  1. #1.Define a string for capturing
  2. #2.Capture values by parentheses
  3. star->Alpha Centauri; distance->4.3650; year->1689
  1. #1.Define a string for capturing
  2. $line1="Alpha Centauri-4.3650-1689";
  3. #2.Capture values by parentheses
  4. $line1 =~ /(?<star>.*)-(?<distance>.*)-(?<year>.*)/;
  5. print "start->$+{star}; distance->$+{distance}; year->$+{year}";
  6. -------output----------
  1. #1.Define a string for capturing
  2. #2.Capture values by parentheses
  3. start->Alpha Centauri; distance->4.3650; year->1689
  1. import re
  2. #1.Define a string for capturing
  3. line1="Alpha Centauri-4.3650-1689"
  4. #2.Capture values by parentheses
  5. match = re.search ('(?P<star>.*)-(?P<distance>.*)-(?P<year>.*)',line1)
  6. print "start->%s; distance->%s; year->%s" %(match.group('star'), match.group('distance'), match.group('year'))
  7. -------output----------
  1. #1.Define a string for capturing
  2. #2.Capture values by parentheses
  3. start->Alpha Centauri; distance->4.3650; year->1689

Regex: Single line mode by mode modifier: (?s)

(?s) or DOTALL mode
makes . match all characters, including newline.
makes ^ and $ match at the start and end of the string only.
In Perl, use /s, or (?s)
In Python, use re.DOTALL or re.S, or(?s)

Tasks:
1.Define a multiline string
2.Match all before substirng 'two'by DOT MATCH ALL mode
		  
Bash Perl Python
  1. #1.Define a multiline string
  2. read -r -d '' mstr <<'EOD';
  3. one
  4. two
  5. three
  6. EOD
  7. #2.Match all before substirng 'two'by DOT MATCH ALL mode
  8. m="$(grep -Pzo '(?s)^.*two'<<<"$mstr")"
  9. echo -n "->$m<-"
  10. -------output----------
  1. #1.Define a multiline string
  2. #2.Match all before substirng 'two'by DOT MATCH ALL mode
  3. ->one
  4. two<-
  1. #1.Define a multiline string
  2. $mstr = <<'EOD';
  3. one
  4. two
  5. three
  6. EOD
  7. #2.Match all before substirng 'two'by DOT MATCH ALL mode
  8. $mstr =~ /^(.*two)/s;
  9. print "->$1<-\n";
  10. # or
  11. $mstr =~ /(?s)^(.*two)/;
  12. print "->$1<-\n";
  13. -------output----------
  1. #1.Define a multiline string
  2. #2.Match all before substirng 'two'by DOT MATCH ALL mode
  3. ->one
  4. two<-
  5. # or
  6. ->one
  7. two<-
  1. #1.Define a multiline string
  2. import re
  3. mstr='''one
  4. two
  5. three'''
  6. #2.Match all before substirng 'two'by DOT MATCH ALL mode
  7. print re.findall(r'(?s)^(.*two)', mstr)
  8. # or
  9. print re.findall(r'^(.*two)', mstr, re.S)
  10. -------output----------
  1. #1.Define a multiline string
  2. #2.Match all before substirng 'two'by DOT MATCH ALL mode
  3. ['one\ntwo']
  4. # or
  5. ['one\ntwo']

Regex: Multiline mode by Mode modifier: (?m)

(?m) or MULTILINE mode make ^ and $ match at the start and end of each line in the string

Tasks:
1.Define a multiline string
2.Match all before substirng 'two'by MULTILINE mode
		  
Bash Perl Python
  1. #1.Define a multiline string
  2. read -r -d '' mstr <<'EOD';
  3. one
  4. two
  5. three
  6. EOD
  7. #2.Match all before substirng 'two'by MULTILINE mode
  8. m="$(grep -Pzo '(?m)^.*two'<<<"$mstr")"
  9. echo -n "->$m<-"
  10. -------output----------
  1. #1.Define a multiline string
  2. #2.Match all before substirng 'two'by MULTILINE mode
  3. ->two<-
  1. #1.Define a multiline string
  2. $mstr = <<'EOD';
  3. one
  4. two
  5. three
  6. EOD
  7. #2.Match all before substirng 'two'by MULTILINE mode
  8. $mstr =~ /(?m)^(.*two)/;
  9. print "->$1<-\n";
  10. -------output----------
  1. #1.Define a multiline string
  2. #2.Match all before substirng 'two'by MULTILINE mode
  3. ->two<-
  1. #1.Define a multiline string
  2. import re
  3. mstr='''one
  4. two
  5. three'''
  6. #2.Match all before substirng 'two'by MULTILINE mode
  7. print re.findall(r'(?m)^(.*two)', mstr)
  8. # or
  9. print re.findall(r'^(.*two)', mstr, re.M)
  10. -------output----------
  1. #1.Define a multiline string
  2. #2.Match all before substirng 'two'by MULTILINE mode
  3. ['two']
  4. # or
  5. ['two']

Regex: Lookahead assertion: (?=...), (?!...)


		  
		  

Tasks:
1.Define a string
2.Positive lookahead assertion: find all dog colors
3.Negative lookahead assertion: find all animal colors, that is not a dog
		  
Bash Perl Python
  1. #1.Define a string
  2. str="blackcat,yellowdog,whitecat,browndog,redrooster";
  3. #2.Positive lookahead assertion: find all dog colors
  4. grep -Po '([a-z]+)(?=dog)' <<<"$str"
  5. #3.Negative lookahead assertion: find all animal colors, that is not a dog
  6. grep -Po '(black|yellow|white|brown|red)(?!dog)' <<<"$str"
  7. -------output----------
  1. #1.Define a string
  2. #2.Positive lookahead assertion: find all dog colors
  3. yellow
  4. brown
  5. #3.Negative lookahead assertion: find all animal colors, that is not a dog
  6. black
  7. white
  8. red
  1. #1.Define a string
  2. $str="blackcat,yellowdog,whitecat,browndog,redrooster";
  3. #2.Positive lookahead assertion: find all dog colors
  4. @a=($str =~ m/([a-z]+)(?=dog)/g); print "@a\n";
  5. #3.Negative lookahead assertion: find all animal colors, that is not a dog
  6. @a=($str =~ m/(black|yellow|white|brown|red)(?!dog)/g); print "@a";
  7. -------output----------
  1. #1.Define a string
  2. #2.Positive lookahead assertion: find all dog colors
  3. yellow brown
  4. #3.Negative lookahead assertion: find all animal colors, that is not a dog
  5. black white red
  1. #1.Define a string
  2. import re
  3. str="blackcat,yellowdog,whitecat,browndog,redrooster"
  4. #2.Positive lookahead assertion: find all dog colors
  5. print re.findall('[a-z]+(?=dog)', str)
  6. #3.Negative lookahead assertion: find all animal colors, that is not a dog
  7. print re.findall('(black|yellow|white|brown|red)(?!dog)', str)
  8. -------output----------
  1. #1.Define a string
  2. #2.Positive lookahead assertion: find all dog colors
  3. ['yellow', 'brown']
  4. #3.Negative lookahead assertion: find all animal colors, that is not a dog
  5. ['black', 'white', 'red']

Regex: Lookbehind assertion: (?<=...), (?<!...)


		  
		  

Tasks:
1.Define a string
2.Positive lookbehind assertion: find all animals that is white
3.Negative lookbehind assertion: find all animals that is not white
		  
Bash Perl Python
  1. #1.Define a string
  2. str="blackcat,yellowdog,whitecat,browndog,redrooster";
  3. #2.Positive lookbehind assertion: find all animals that is white
  4. grep -Po '(?<=white)([a-z]+)' <<<"$str"
  5. #3.Negative lookbehind assertion: find all animals that is not white
  6. grep -Po '(?<!white)(cat|dog|rooster)' <<<"$str"
  7. -------output----------
  1. #1.Define a string
  2. #2.Positive lookbehind assertion: find all animals that is white
  3. cat
  4. #3.Negative lookbehind assertion: find all animals that is not white
  5. cat
  6. dog
  7. dog
  8. rooster
  1. #1.Define a string
  2. $str="blackcat,yellowdog,whitecat,browndog,redrooster";
  3. #2.Positive lookbehind assertion: find all animals that is white
  4. @a=($str =~ m/(?<=white)([a-z]+)/g); print "@a\n";
  5. #3.Negative lookbehind assertion: find all animals that is not white
  6. @a=($str =~ m/(?<!white)(cat|dog|rooster)/g); print "@a\n";
  7. -------output----------
  1. #1.Define a string
  2. #2.Positive lookbehind assertion: find all animals that is white
  3. cat
  4. #3.Negative lookbehind assertion: find all animals that is not white
  5. cat dog dog rooster
  1. #1.Define a string
  2. import re
  3. str="blackcat,yellowdog,whitecat,browndog,redrooster"
  4. #2.Positive lookbehind assertion: find all animals that is white
  5. print re.findall('(?<=white)[a-z]+',str)
  6. #3.Negative lookbehind assertion: find all animals that is not white
  7. print re.findall('(?<!white)(cat|dog|rooster)',str)
  8. -------output----------
  1. #1.Define a string
  2. #2.Positive lookbehind assertion: find all animals that is white
  3. ['cat']
  4. #3.Negative lookbehind assertion: find all animals that is not white
  5. ['cat', 'dog', 'dog', 'rooster']