[Optimization] Greedy method

Given two sequences of letters A and B, find if B is a subsequence of A in the
sense that one can delete some letters from A and obtain the sequence B.

Greedy领先的思想。(always stay ahead)

[Optimization] Greedy method

There is a line of 111 stalls, some of which need to be covered with boards.
You can use up to 11 boards, each of which may cover any number of
consecutive stalls. Cover all the necessary stalls, while covering as few total
stalls as possible

一块大板,不断去掉大空隙。直到大板被分为要求的11个。

本质:排序“间隙”,先 eliminate 最大的间隙(贪心的体现)

Ref: https://projectalgorithm.wordpress.com/2011/04/25/greedier-than-you/

[Optimization] Greedy method

[Optimization] Greedy method

给定一个长度为m的区间,再给出n条线段的起点和终点(注意这里是闭区间),

求最少使用多少条线段可以将整个区间完全覆盖。

区间长度8,可选的覆盖线段[2,6],[1,4],[3,6],[3,7],[6,8],[2,4],[3,5]

区间覆盖问题  《区间完全覆盖》

先按照“起始点”排序,结果如下:

[Optimization] Greedy method

Prove:

需要最少的线段进行覆盖,那么选取的线段必然要尽量长,而已经覆盖到的区域之前的地方已经无所谓了。

左端点没有太大的意义,所以选择右端点来覆盖。

Find a maximum size subset of compatible activities.

[Optimization] Greedy method

求“一段时间内”能容纳的“最多活动数”。《最大不相交覆盖》

最早结束时间的活动优先。

Transforming any optimal solution to the greedy solution with equal number of
activities

find that proving the greedy solution is also optimal.

证明

greedy exchange, 即证明greedy所得结论不会worse.

Extended:

“最多活动数” ==> "总的最长活动时间“,且”每个活动时间不等“,则,greedy失效,需dynamic programming.

Schedule all the jobs so that the lateness of the job with the largest
lateness is minimised.

最小化任务延迟

只关心deadlines.

证明:(交换论证)

关键步骤的证明是"减少一个逆序de调度导致的最大延迟不会更糟".

A list of n files fi of lengths li which have to be stored on a tape.
Each file is equally likely to be needed. To retrieve a file, one must start from
the beginning of the tape and scan it until the tape is found and read.

Order the files on the tape so that the average (expected) retrieval time
is minimised.

如果p不再均匀,则比较P/L,如下。

[Optimization] Greedy method

[Optimization] Greedy method

用最少的钉子穿插所有的木条。

Greedy的对象的选择问题:

    • 穿插最多的优先 (greedy 失败)
    • 最早结束的优先 (greedy 成功)

Assume you are given n sorted arrays of different sizes. You are allowed
to merge any two arrays into a single new sorted array and proceed in
this manner until only one array is left. Design an algorithm that
achieves this task and uses minimal total number of moves of elements of
the arrays. Give an informal justification why your algorithm is optimal.

类似Merge sort过程的,huffman code原理的东东。

较小的块优先合并。

合并后的较大的块,如果还有后续的操作,那么前面合并得越大,将会成为后续移动操作中的累赘。

Along the long, straight road from Loololong to Goolagong houses are
scattered quite sparsely, sometimes with long gaps between two
consecutive houses. Telstra must provide mobile phone service to people
who live alongside the road, and the range of Telstras cell base station is
5km. Design an algorithm for placing the minimal number of base
stations alongside the road, that is sufficient to cover all houses.

一个思考:从左到右,从右到左,既然都是greedy,minimum是一样的,但stations的位置却不同。

You are given a connected graph with weighted edges. Find a spanning
tree such that the largest weight of all of its edges is as small as possible.

求加强连通图的最小生成树

为何最优?

    • Kruskal Algo: sort e by cost 以edge为主角,则适合"稀疏图"
    • Prim Algo: start from any vertex, add lightest edge one by one. 以vertex为主角,则适合"稠密图"

Ref: 最小生成树-Prim算法和Kruskal算法

Design an algorithm which produces a minimum spanning tre T 0 for the

new graph containing the additional vertex vn+1 and which runs in time

O(n log n).

New vertex与其他n个vertex的边做排序。

There are n radio towers for broadcasting tsunami warnings. You are

given the coordinates of each tower and its radius of range. When a

tower is activated, all towers within the radius of range of the tower will

also activate, and those can cause other towers to activate and so on.

You need to equip some of these towers with seismic sensors so that

when these sensors activate the towers where these sensors are located all

towers will eventually get activated and send a tsunami warning.

The goal is to design an algorithm which finds the fewest number of

towers you must equip with seismic sensors.

[Optimization] Greedy method

总有一个塔,在连锁反应中能激活最多的其他塔。给它安装报警器即可。

Partition the vertices of G into k disjoint subsets so that the minimal
distance between two points belonging to different sets of the partition is as
large as possible. Thus, we want a partition into k disjoint sets which are as
far apart as possible.

[Optimization] Greedy method

Sort the edges in an increasing order and start performing the usual
Kruskal’s algorithm for building a minimal spanning tree,

but stop when you obtain k trees, rather than a single spanning tree.

红线肯定大于任何一条蓝线。

时间复杂度:

n^2条边,O(n^2 log n).

采用“并查集”数据结构后,

we make at most 2n2 calls of the Find operation and

at most n calls of the Union operation.

Assume that a weighted (undirected) graph G = (V, E) has all weights of edges
distinct and that its set of vertices V has been partitioned into two disjoint subsets,
X and V \X and assume that an edge e = (u, v) is the smallest weight edge whose
one end belongs to X and the other end to V\X (Y). Prove that every spanning tree
must contain edge e.

经过e,那么e把图分为了两份。

你丫说不经过e,那么就是经过其他边儿咯?加上你说的这条边,但暂时不删除e,是不是就构成了一个circle?

circle里,谁最小?!当然e最小!

有最小的e,为何还要你说的那条边?!故,你的逻辑有矛盾,得证。

[Optimization] Greedy method

Extended:

Let G = (V, E) be a weighted (undirected) graph has all weights
of edges distinct and let e be the highest weight edge in C.

Prove that e cannot belong to the minimum spanning tree.

证明 circle 里的最大边,不可能属于MST(最小生成树)

反证法:

如果是的话,这条边把图一分为二,再加个其他边,假设为其他方案中选中了这条,而不是这个最大边e,

那么,又出现了一个circle。

可见,还不如不要最大边e为好。

[Optimization] Greedy method

Extended:

Assume that you are given a weighted (undirected) graph G = (V, E) with all weights
of edges distinct and its minimum spanning tree T . Assume now that you add a new
edge e to G. Design a linear time algorithm which produces the minimum spanning
tree for the new graph with the additional edge.

加了新边e,如何更新MST(最小生成树)

Sol:

找个出现的新circle,然后删掉环上的最大边。

[Optimization] Greedy method

Scheduling unit jobs with penalties and deadlines.

The problem of scheduling unit-time tasks with deadlines and penalties for a single processor has the following inputs:

[Optimization] Greedy method a set S = {1, 2, . . . , n} of n unit-time tasks;

[Optimization] Greedy method a set of n integer deadlines d1d2, . . . , dn, such that each di satisfies 1 [Optimization] Greedy method di [Optimization] Greedy method n and task i is supposed to finish by time di; and

[Optimization] Greedy method a set of n nonnegative weights or penalties w1,w2, . . . , wn, such that a penalty wi is incurred if task i is not finished by time di and no penalty is incurred if a task finishes by its deadline.

We are asked to find a schedule for S that minimizes the total penalty incurred for missed deadlines.

拟阵理论(matroid)

Ref: http://www.it610.com/article/1920678.htm

实现任务的最优调度主要就是利用贪心算法中拟阵的思想。

如果S是一个带期限的单位时间任务的集合,且I是所有独立的任务集构成的集合,则对应的系统 M =(S,I)是一个拟阵。满足如下条件:

    1. S是一个非空有穷集合;
    2. l⊆2^S且ϕ∈l (I为S的非空子集族)
    3. l满足交换性质 (Augmentation):若A∈l,B∈l且|A|<|B|,则∃x∈B−A,使得A∪{x}∈l (这条性质给了我们已知集合B,构造集合A的方法)
    4. l满足遗传性质 (Downward closure):若B∈l,A⊆B,则A∈l. Or, B是S的独立子集,这样B的任意子集也都是S的独立子集。(暗示了我们已知集合B,找出其子集的性质的办法)

利用拟阵解决任务调度问题的算法原理主要就是将最小化迟任务的惩罚之和问题转化为最大化早任务的惩罚之和的问题,也就是说在任务调度的时候优先选择当前任务序列中惩罚最大的任务。

这里,假设集合A存放任务的一个调度。如果存在关于A中任务的一个调度,使得没有一个任务是迟的,称任务集合A是独立的。

Prove:

先证明其是拟阵,可采用最大化早任务的惩罚和的贪心算法。

Extended:

O(n)次独立性检查的每一次都用O(n)时间。如何优化?

并查集。

Assume you have $2, $1, 50c, 20c, 10c and 5c coins to pay for your lunch. Design
an algorithm that, given the amount that is a multiple of 5c, pays it with a minimal
number of coins.

明显是从大面值开始。类似于人民币问题,找零时符合贪心算法。

Prove:

满足“最优子结构性质”,“贪心选择性质”。

link

最优子结构性质:

  一个问题的最优解包含其子问题的最优解。(子结构也是子问题的最优解)

例如95c = 50c + 20c + 20c + 5c 这个贪心算法的结果是最优解,是满足optimal substrcuture的。

少一个20c,本应该推断是75c的最优解;你非要说不是,且有个更好的解,那么就是假设:

“两张纸币就能搞定“,那么在你这个假设的基础上加一个事实:加一个20c成为了“仅需3张纸币就能达到95c的最优解“。

显然,这个假设是与原问题冲突的。故,这里若使用贪心算法所得结果是满足“最优子结构性质”。

贪心选择性质:

整体最优解,可以通过一系列局部最优的选择来达到。(每张纸币的量已是最优,不可能更大)

贪心算法的结果应保证了“每个面额不可能更多,即已是最大”。因为贪心嘛。

假设存在其他更优算法。

因为,总额不变,纸币量减少的话:减少任意的相对小的纸币,都需要更大的面额的纸币来填充差额。

但贪心解已满足每张纸币量达到了max,故产生矛盾。原问题满足“贪心选择性质“。

Extended:

    • 指数增长的面额情况时,可换的硬币的单位(或者称 面值)是 c 的幂,也就是 c0,c1,... ,ck,其中整数 c>1,k>=1。

Prove:

常识一:

对于最优解而言,如果使用了面值为 ci 的硬币去找零,那么 ci 最多只能使用 c-1 个。若使用c个的话,c*ci 意味着可以使用一张更大面额纸币来替换众多小纸币。

根据常识一,

如果非贪心是最优的,非贪心使用的所有面值为c^i的硬币个数应该小于c。否则,不是最优!

There are N robbers who have stolen N items. You would like to distribute the items
amongst the robbers (one item per robber). You know the precise value of each item.
Each robber has a particular range of values they want their item to be worth (too
cheap and they will not have made money, too expensive and they will draw a lot
of attention). Devise an algorithm that can distribute the items so each robber is
happy or determines that there is no such distribution.

Sol:

对物品价值v升序排序。遍历每一个物品价值v:

最小bound在范围内,即当前v的左边,最大bound也在(这是默认肯定的),这些 j 构成一个集合。

分配集合中“最大bound”最小的那个 to robber。  

[Optimization] Greedy method

例如分配v3时的 j3, j4:j4的tail大,所以把机会留给j3。

隐含的道理是:

    • 只关心tail,不关心head。因为head在此不管有多早似乎都以后的分配都没有意义。
    • 思考:j1与j2可以互换么?

Prove

“Cut-and-paste" arguments.

改变一个逆序,不会变得更糟。因为,减少一个逆序,例如 j2重新在j1之前,那么j2的deadline更久,就更能成立!

Suppose you have n video streams that need to be sent, one after another, over a
communication link. Stream i consists of a total of bi bits that need to be sent, at a
constant rate, over a period of ti seconds. You cannot send two streams at the same
time, so you need to determine a schedule for the streams: an order in which to send
them. Whichever order you choose, there cannot be any delays between the end of
one stream and the start of the next.

Suppose your schedule starts at time 0. We assume that all the
values bi and ti are positive integers. Now, because you're just one user, the link does
not want you taking up too much bandwidth, so it imposes the following constraint,
using a fixed parameter r:

For each natural number t > 0, the total number of bits you send over the time interval from 0 to t cannot exceed rt.

Note that this constraint is only imposed for time intervals that start at 0, not for
time intervals that start at any other value. We say that a schedule is valid if it
satisfies the constraint.

Example.

Suppose we have n = 3 streams, with (b1, t1) = (2000, 1), (b2, t2) = (6000, 2), (b3, t3) = (2000, 1), and suppose the link’s parameter is r = 5000.

Then the schedule that runs the streams in the order 1, 2, 3, is valid, since the constraint (*) is satisfied:

  • t = : the whole first stream has been sent, and < · 1
  • t = : half the second stream has also been sent, and + < · .

Similar calculations hold for t = 3 and t = 4.

link

NB:

不同的stream,单位时间能发送的bit不同。(理解为压缩的效率不同即可)

右侧的 rt 代表了一种动态的limitation。贪心当然是:能尽可能地 reach/follow this limitation as far as possible。

However, 这里只是判断下:是否存在 满足如此条件的schedule。贪心当然是:找最可能的方式,即是保持limitation的距离越远越好。

Design:

(a) In O(nlogn), order the streams in increasing order of bi/ti  , and check if this schedule has the desired property.

(b) To get an ordering in O(n) time define si = r*ti − bi and schedule streams so that you
start with all streams for which si is non-negative, in any order, followed by those for which si is negative, also in any order.

(因为只要把小于r*ti的排在前面,大于的排在后面即可,至于排序操作,确实是多余)

Prove:

交换论证(exchange argument)

只有一台机器,单线程执行某个人交代的的任务,任务的重要性不同。

最小化 总的“重要性*截止时间”

Schedule jobs in a decreasing order of the ratio ri = wi /ti  (重要性/任务时长)

prove:

假设有更好的方案。而后减少逆序看变化。

类似 the tape storage problem。

上一篇:WebService--jax


下一篇:无法创建spool文件