shell 学习笔记网络操作示例

2022-11-05 11:24:06

摘自 Linux Shell 脚本攻略第五章一团乱麻？没这回事！

解析网站数据

$ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html |grep -o "Rank-.*" | sed -e 's/ *Rank-\([0-9]*\) *\(.*\)/\1\t\2/' | sort -nk 1 > actresslist.txt

1   Keira Knightley
2   Natalie Portman
3   Monica Bellucci

网址无响应，以上输出从文中摘抄

图片爬取器及下载工具

#!/bin/bash 
#用途:图片下载工具
#文件名: img_downloader.sh
if [ $# -ne 3 ];then
  echo "Usage: $0 URL -d DIRECTORY"
  exit -1 
fi
while [ $# -gt 0 ];do
      case $1 in
      -d) shift; directory=$1; shift ;;
      *) url=$1; shift;;
      esac
done
# case语句检查第一个参数($1)。如果匹配-d，那么下一个参数一定是目录，接着就移动参数并保存目录名。否则的话，就是URL

mkdir -p $directory;
baseurl=$(echo $url | egrep -o "https?://[a-z.\-]+")
echo Downloading $url
# egrep -o "<img src=[^>]*>"  只打印带有属性值的<img>标签
# sed 's/<img src=\"\([^"]*\).*/\1/g'  可以从字符串src="url"中提取出url
# sed "s,^/,$baseurl/,"  baseurl替换掉起始的/
curl -s $url | egrep -o "<img[^>]*src=[^>]*>" | sed 's/<img[^>]*src=\"\([^"]*\).*/\1/g' | sed "s,^/,$baseurl/," > /tmp/$$.list
cd $directory;
while read filename;do
  echo Downloading $filename
  curl -s -O "$filename" --silent
done < /tmp/$$.list

网址无响应，以上输出从文中摘抄

网页相册生成器

$ cat generate_album.sh 
#!/bin/bash 
#文件名: generate_album.sh 
#用途: 用当前目录下的图片创建相册

echo "Creating album.." 
mkdir -p thumbs 
# 脚本将一直到EOF1的这部分内容（不包括EOF1）重定向到index.html
cat <<EOF1 > index.html 
<html> 
<head> 
<style>
body 
{ 
  width:470px; 
  margin:auto; 
  border: 1px dashed grey; 
  padding:10px; 
}
img 
{
  margin:5px; 
  border: 1px solid black;
} 
</style> 
</head> 
<body> 
<center><h1> #Album title </h1></center> <p> 
EOF1

for img in *.jpg; do 
  # 将创建宽度为100像素的图像缩略图
  convert "$img" -resize "100x" "thumbs/$img"
  echo "<a href=\"$img\" >" >>index.html
  echo "<img src=\"thumbs/$img\" title=\"$img\" /></a>" >> index.html
done

cat <<EOF2 >> index.html
</p> 
</body> 
</html> 
EOF2

echo Album generated to index.html
$ ./generate_album.sh 
Creating album..
Album generated to index.html

Twitter 命令行客户端

#!/bin/bash 
#文件名: twitter.sh 
#用途:twitter客户端基本版

oauth_consumer_key=YOUR_CONSUMER_KEY 
oauth_consumer_secret=YOUR_CONSUMER_SECRET

config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc

if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]];then 
	echo -e "Usage: $0 tweet status_message\n  OR\n  $0 read\n"
	exit -1; 
fi



# source /usr/local/bin/TwitterOAuth.sh 
# 使用source命令引入TwitterOAuth.sh库，这样就可以利用其中定义好的函数访问 Twitter了。函数TO_init负责初始化库
source bash-oauth-master/TwitterOAuth.sh TO_init

if [ ! -e $config_file ]; then 
	# 获取一个OAuth令牌（token）以及令牌密钥（token secret）
	TO_access_token_helper
	if (( $? == 0 )); then
	echo oauth_token=${TO_ret[0]} > $config_file
	echo oauth_token_secret=${TO_ret[1]} >> $config_file 
	fi 
fi

source $config_file

if [[ "$1" = "read" ]];then 
	# 库函数TO_statuses_home_timeline可以从Twitter中获取发布的内容。该函数返回的数 据是一个JSON格式的长字符串
	# [{"created_at":"Thu Nov 10 14:45:20 +0000 "016","id":7...9,"id_str":"7...9","text":"Dining...
	TO_statuses_home_timeline '' 'YOUR_TWEET_NAME' '10'
	echo $TO_ret | sed 's/,"/\n/g' | sed 's/":/~/' | \ awk -F~ '{} {if ($1 == "text") {txt=$2;} else if ($1 == "screen_name") printf("From: %s\n Tweet: %s\n\n", $2, txt);} \ {}' | tr '"' ' '

elif [[ "$1" = "tweet" ]];then 
	shift TO_statuses_update '' "$@" echo 'Tweeted :)'

fi

仅做摘抄

通过 Web 服务器查询单词含义

#!/bin/bash 
#文件名: define.sh 
#用途:用于从dictionaryapi.com获取词汇含义

key=YOUR_API_KEY_HERE

if [ $# -ne 2 ]; then
	echo -e "Usage: $0 WORD NUMBER"
	exit -1; 
fi

# nl在行前加上行号
curl --silent  http://www.dictionaryapi.com/api/v1/references/learners/xml/$1?key=$key | grep -o \<dt\>.*\</dt\> | sed 's$</*[a-z]*>$$g' |  head -n $2 | nl

$ ./define.sh usb 1 1 :a system for connecting a computer to another device (such as a printer, keyboard, or mouse) by using a special kind of cord a USB cable/port USB is an abbreviation of "Universal Serial Bus."How it works...

只做摘抄，运行未验证

查找网站中的无效链接

#!/bin/bash 
#文件名: find_broken.sh 
#用途: 查找网站中的无效链接

if [ $# -ne 1 ]; then
	echo -e "$Usage: $0 URL\n"
	exit 1; 
fi

echo Broken links:

mkdir /tmp/$$.lynx 
cd /tmp/$$.lynx

# lynx -traversal URL 会在当前工作目录下生成多个文件， 其中包括 reject.dat ， 该 文件包含网站中的所有链接
lynx -traversal $1 > /dev/null 
count=0;

# sort -u用来建立一个不包含重复项的列表
sort -u reject.dat > links.txt

while read link; do 
	# 通过curl -I检验接收到的响应头部
	output=`curl -I $link -s  | grep -e "HTTP/.*OK" -e "HTTP/.*200"` 
	if [[ -z $output ]]; then 
		output=`curl -I $link -s | grep -e "HTTP/.*301"` 
		if [[ -z $output ]]; then 
			echo "BROKEN: $link" 
			let count++ 
		else echo "MOVED: $link" 
		fi
	fi 
done < links.txt 

[ $count -eq 0 ] && echo No broken links found.

$ ./find_broken.sh http://10.18.7.30
Broken links:
No broken links found.

跟踪网站变动

#!/bin/bash 
#文件名: change_track.sh 
#用途: 跟踪页面变动

if [ $# -ne 1 ]; then
	echo -e "$Usage: $0 URL\n"
	exit 1; 
fi

first_time=0 # 非首次运行

# 用[ ! -e "last.html" ];检查自己是否是首次运行。如果last.html不存在，那就 意味着这是首次运行，必须下载Web页面并将其复制为last.html
if [ ! -e "last.html" ]; then
	first_time=1
	# 首次运行 
fi

curl --silent $1 -o recent.html

if [ $first_time -ne 1 ]; then 
	changes=$(diff -u last.html recent.html) 
	if [ -n "$changes" ]; then 
		echo -e "Changes:\n" 
		echo "$changes" 
	else 
		echo -e "\nWebsite has no changes" 
	fi 
else 
	echo "[First run] Archiving.."
fi

cp recent.html last.html

只做摘抄

发送 Web 页面并读取响应

POST和GET是HTTP的两种请求类型，用于发送或检索信息。在GET请求方式中，我们利用页面的URL来发送参数（名称-值）。而在POST请求方式中，参数是放在HTTP消息主体中发送的。 POST方式常用于提交内容较多的表单或是私密信息

这里我们使用了tclhttpd软件包中自带的样例网站guestbook。你可以从http://sourceforge.net/ projects/tclhttpd下载tclhttpd，然后在本地系统上运行，创建一个本地Web服务器。如果用户点击按钮Add me to your guestbook，页面会发送一个包含姓名和URL的请求，请求中的信息会被添加到guestbook的页面上，以显示出都有谁访问过该站点

下载tclhttpd软件包，切换到bin目录。启动tclhttpd守护进程
tclsh httpd.tcl
使用curl发送POST请求并读取网站的响应（HTML格式）

$ curl URL -d "postvar=postdata2&postvar2=postdata2"
# 或者
$ curl http://127.0.0.1:8015/guestbook/newguest.html -d "name=Clif&url=www.noucorp.com&http=www.noucorp.com"

<HTML> 
<Head> 
<title>Guestbook Registration Confirmed</title> </Head>
<Body BGCOLOR=white TEXT=black>
<a href="www.noucorp.com">www.noucorp.com</a>
<DL> <DT>Name <DD>Clif <DT>URL <DD> </DL> 
www.noucorp.com
</Body>

-d表示以POST方式提交用户数据。-d的字符串参数形式类似于GET请求。每对var=value 之间用&分隔

也可以利用wget的–post-data "string"来提交数据

$ wget http://127.0.0.1:8015/guestbook/newguest.cgi --post-data "name=Clif&url=www.noucorp.com&http=www.noucorp.com" -O output.html

“名称-值”的格式同cURL中一样。output.html中的内容和cURL命令返回的一样

以POST形式发送的字符串（例如-d或–post-date）总是应该以引用的形式给出。否则，&会被shell解读为该命令需要作为后台进程运行

如果查看网站的源代码（使用网页浏览器的View Source选项），你会发现一个与下面类似的 HTML表单

<form action="newguest.cgi" " method="post" >
<ul>
<li> Name: <input type="text" name="name" size="40" >
<li> Url: <input type="text" name="url" size="40" >
<input type="submit" >
</ul> </form>

其中，newguest.cgi是目标URL。当用户输入详细信息并点击Submit按钮时，姓名和URL就以 POST请求的方式被发送到newguest.cgi页面，然后响应页面被返回到浏览器

从 Internet 下载视频

有一个叫作youtube-dl的视频下载工具。多数发行版中并没有包含这个工具，软件仓库里的版本也未必是最新的，因此最好是去官方网站下载（http://yt-dl.org）。

按照页面上的链接和信息下载并安装youtube-dl

youtube-dl https://www.youtube.com/watch?v=AJrsl3fHQ74

使用 OTS 汇总文本

开放文本摘要器（Open Text Summarizer，OTS）可以从文本中删除无关紧要的内容，生成一份简洁的摘要

大多数Linux发行版并不包含ots软件包，可以通过下列命令进行安装

apt-get install libots-devel

ots用起来很简单。它从文件或stdin中读取输入，将生成的摘要输出到stdout

ots LongFile.txt | less
# 或者
cat LongFile.txt | ots | less

ots也可以结合 curl生成网站的摘要信息。例如，你可以用ots为那些絮絮叨叨的博客做摘要

curl http://BlogSite.org | sed -r 's/<[^>]+>//g' | ots | less

在命令行中翻译文本

你可以通过浏览器访问Google所提供的在线翻译服务。Andrei Neculau编写了一个awk脚本，可以从命令行中访问该服务并进行翻译

大多数Linux发行版中都没有包含这个命令行翻译器，不过你可以从Git直接安装

cd ~/bin 
wget git.io/trans 
chmod 755 ./trans

trans可以将文本翻译成locale环境变量所设置的语言

$> trans "J'adore Linux"

J'adore Linux

I love Linux

Translations of J'adore Linux French -> English

J'adore Linux I love Linux

你可以在待翻译的文本前使用选项来控制翻译所用的语言。选项格式如下

from:to

要想将英语翻译成法语，可以使用下列命令

$> trans en:fr "I love Linux" 
J'aime Linux

码农公寓