Coding Bash vs. Perl vs. Python (by Evan Liu)


Topics about html

Html: get the title of a list of websites

In bash, wget -O option is to specify the name of the file to dump the page contents. Wget -O - will dump the content to standard output;  -q quiet option is to turn off's wget output. 

Tasks:

		  
Bash Perl Python
  1. str="bing.com|google.com"
  2. IFS='|' read -a sites <<<"$site_str"
  3. unset IFS
  4. for s in ${sites[@]}; do
  5. #text=$(wget $s -O - 2>/dev/null)
  6. #text=$(wget $s -q -O -)
  7. text=$(curl -L $s 2>/dev/null)
  8. title=`egrep -o "<title>+.*</title>" <<<"$text"`
  9. printf "website: %s --> %s\n" "$s" "$title"
  10. done
  11. -------output----------
  1. website: bing.com --> <title>Bing</title>
  2. website: google.com --> <title>Google</title>
  1. use warnings;
  2. use LWP::Simple;
  3. @sites=split(/\|/,'bing.com|google.com');
  4. foreach $s (@sites) {
  5. $text = get('http://'.$s) or die 'Unable to get page';
  6. $text =~ m/(<title>+.*<\/title>)/;
  7. printf "website: %s --> %s\n",$s,$1;
  8. }
  9. -------output----------
  1. website: bing.com --> <title>Bing</title>
  2. website: google.com --> <title>Google</title>
  1. import re, urllib
  2. sites = 'bing.com|google.com'.split('|')
  3. pat=re.compile(r'<title>+.*</title>', re.I | re.M)
  4. for s in sites:
  5. u = urllib.urlopen('http://' + s)
  6. text = u.read()
  7. #title = re.findall(r'<title>+.*</title>',str(text),re.I|re.M)
  8. title = re.findall(pat, str(text))
  9. print 'website: %s --> %s' % (s, title[0])
  10. -------output----------
  1. website: bing.com --> <title>Bing</title>
  2. website: google.com --> <title>Google</title>