Html in Bash, Perl, Python

Html: get the title of a list of websites

In bash, wget -O option is to specify the name of the file to dump the page contents. Wget -O - will dump the content to standard output;  -q quiet option is to turn off's wget output.

Tasks:

Bash	Perl	Python
str="bing.com\|google.com" IFS='\|' read -a sites <<<"$site_str" unset IFS for s in ${sites[@]}; do #text=$(wget $s -O - 2>/dev/null) #text=$(wget $s -q -O -) text=$(curl -L $s 2>/dev/null) title=`egrep -o "<title>+.*</title>" <<<"$text"` printf "website: %s --> %s\n" "$s" "$title" done -------output---------- website: bing.com --> <title>Bing</title> website: google.com --> <title>Google</title>	use warnings; use LWP::Simple; @sites=split(/\\|/,'bing.com\|google.com'); foreach $s (@sites) { $text = get('http://'.$s) or die 'Unable to get page'; $text =~ m/(<title>+.*<\/title>)/; printf "website: %s --> %s\n",$s,$1; } -------output---------- website: bing.com --> <title>Bing</title> website: google.com --> <title>Google</title>	import re, urllib sites = 'bing.com\|google.com'.split('\|') pat=re.compile(r'<title>+.</title>', re.I \| re.M) for s in sites: u = urllib.urlopen('http://' + s) text = u.read() #title = re.findall(r'<title>+.</title>',str(text),re.I\|re.M) title = re.findall(pat, str(text)) print 'website: %s --> %s' % (s, title[0]) -------output---------- website: bing.com --> <title>Bing</title> website: google.com --> <title>Google</title>

Bash

Perl

Python

str="bing.com|google.com"
IFS='|' read -a sites <<<"$site_str"
unset IFS
for s in ${sites[@]}; do
  #text=$(wget $s -O - 2>/dev/null)
  #text=$(wget $s -q -O -)
  text=$(curl -L $s 2>/dev/null)
  title=`egrep -o "<title>+.*</title>" <<<"$text"`
  printf "website: %s --> %s\n" "$s" "$title"
done

-------output----------

website: bing.com --> <title>Bing</title>
website: google.com --> <title>Google</title>

use warnings;
use LWP::Simple;
@sites=split(/\|/,'bing.com|google.com');
foreach $s (@sites) {
  $text = get('http://'.$s) or die 'Unable to get page';
  $text =~ m/(<title>+.*<\/title>)/;
  printf "website: %s --> %s\n",$s,$1;
}

-------output----------

website: bing.com --> <title>Bing</title>
website: google.com --> <title>Google</title>

import re, urllib
sites = 'bing.com|google.com'.split('|')
pat=re.compile(r'<title>+.*</title>', re.I | re.M)
for s in sites:
  u = urllib.urlopen('http://' + s)
  text = u.read()
  #title = re.findall(r'<title>+.*</title>',str(text),re.I|re.M)
  title = re.findall(pat, str(text))
  print 'website: %s --> %s' % (s, title[0])

-------output----------

website: bing.com --> <title>Bing</title>
website: google.com --> <title>Google</title>

Coding Bash vs. Perl vs. Python (by Evan Liu)

Topics about html

Html: get the title of a list of websites