Navigator
- Bounding consumer preference
- Counting problems with Poisson distribution
- Logistic regression
- Reference
Bounding consumer preference
We model consumer preference in the following way. We assume there is an underlying utility function: R n → R \mathbb{R}^n\to\mathbb{R} Rn→R with domain [ 0 , 1 ] n [0, 1]^n [0,1]n, u ( x ) u(x) u(x) gives a measure of the utility derived by the consumer from the goods basket x x x. It is also reasonable to assume that u u u is concave. This models satiation, or decreasing marginal utility as we increase the amount of goods.
Now suppose we are given some consumer preference data, but we do not know the underlying utility function
u
u
u. Specifically, we have a set of goods baskets
a
1
,
a
2
,
…
,
a
m
∈
[
0
,
1
]
n
a_1, a_2, \dots, a_m\in [0, 1]^n
a1,a2,…,am∈[0,1]n, and some information about preferences among them:
{
u
(
a
i
)
>
u
(
a
j
)
(
i
,
j
)
∈
P
u
(
a
i
)
≥
u
(
a
j
)
(
i
,
j
)
∈
P
w
e
a
k
\begin{cases} u(a_i)>u(a_j)\quad (i, j)\in\mathcal{P}\\ u(a_i)\geq u(a_j)\quad (i,j)\in\mathcal{P}_{weak} \end{cases}
{u(ai)>u(aj)(i,j)∈Pu(ai)≥u(aj)(i,j)∈Pweak
with the function
u
u
u as the infinite-dimensional optimization variable. Since the constraint are all homogeneous, we can express the problem in the from
f
i
n
d
u
s
.
t
.
{
u
:
R
→
R
concave and nondecreasing
u
(
a
i
)
≥
u
(
a
j
)
+
1
(
i
,
j
)
∈
P
u
(
a
i
)
≥
u
(
a
j
)
(
i
,
j
)
∈
P
w
e
a
k
find \quad u\\ s.t. \begin{cases} u:\mathbb{R}\to\mathbb{R}\text{ concave and nondecreasing}\\ u(a_i)\geq u(a_j)+1\quad (i,j)\in\mathcal{P}\\ u(a_i)\geq u(a_j)\quad (i,j)\in\mathcal{P}_{weak} \end{cases}
findus.t.⎩⎪⎨⎪⎧u:R→R concave and nondecreasingu(ai)≥u(aj)+1(i,j)∈Pu(ai)≥u(aj)(i,j)∈Pweak
Counting problems with Poisson distribution
In a wide variety of problems the random variable
y
y
y is nonnegative integer valued, with a Poisson distribution with mean
μ
>
0
\mu>0
μ>0:
P
(
y
=
k
)
=
e
−
μ
μ
k
k
!
\mathbb{P}(y=k)=\frac{e^{-\mu}\mu^k}{k!}
P(y=k)=k!e−μμk
Given a number of observations which consist of pairs
(
u
i
,
y
i
)
,
i
=
1
,
…
,
m
(u_i, y_i), i=1, \dots, m
(ui,yi),i=1,…,m, where
y
i
y_i
yi is the observed value of
y
y
y for which the value of the explanatory variable is
u
i
∈
R
n
u_i\in\mathbb{R}^n
ui∈Rn. Try to find a MLE of the model parameters
a
∈
R
n
a\in\mathbb{R}^n
a∈Rn and
b
∈
R
b\in\mathbb{R}
b∈R from these data:
∏
i
=
1
m
(
a
i
T
u
i
+
b
)
y
i
exp
(
−
(
a
T
u
i
+
b
)
)
y
i
!
\prod_{i=1}^m\frac{(a_i^Tu_i+b)^{y_i}\exp(-(a^Tu_i+b))}{y_i!}
i=1∏myi!(aiTui+b)yiexp(−(aTui+b))
the log-likelihood function is
l
(
a
,
b
)
=
∑
i
=
1
m
(
y
i
log
(
a
T
u
i
+
b
)
−
(
a
T
u
i
+
b
)
−
log
(
y
i
!
)
)
l(a, b)=\sum_{i=1}^m(y_i\log(a^Tu_i+b)-(a^Tu_i+b)-\log(y_i!))
l(a,b)=i=1∑m(yilog(aTui+b)−(aTui+b)−log(yi!))
An MLE of parameters
a
a
a and
b
b
b can be obtained by solving the following convex optimization optimization problem
max
∑
i
=
1
m
y
i
log
(
a
T
u
i
+
b
)
−
(
a
T
u
i
+
b
)
\max\sum_{i=1}^my_i\log(a^Tu_i+b)-(a^Tu_i+b)
maxi=1∑myilog(aTui+b)−(aTui+b)
CVX code
%%
clc;
clear all;
rng(729);
n = 10;
m = 100;
atrue = rand(n, 1); % 设置分布参数:a
btrue = rand; % 设置分布参数 b
u = rand(n, m);
mu = atrue'*u+btrue;
%% generate random variables y from a Poisson distribution
L = exp(-mu);
ns = ceil(max(10*mu));
y = sum(cumprod(rand(ns, m))>=L(ones(ns, 1), :));
% MLE
cvx_begin
variables a(n) bb(1)
maximize sum(y.*log(a'*u+bb)-(a'*u+bb))
cvx_end
Logistic regression
Considering a random variable
y
∈
{
0
,
1
}
y\in\{0, 1\}
y∈{0,1} with
{
P
(
y
=
1
)
=
p
P
(
y
=
0
)
=
1
−
p
\begin{cases} \mathbb{P}(y=1)=p\\ \mathbb{P}(y=0)=1-p \end{cases}
{P(y=1)=pP(y=0)=1−p
The logistic model has the form
p
=
exp
(
a
T
u
+
b
)
1
+
exp
(
a
T
u
+
b
)
p=\frac{\exp(a^Tu+b)}{1+\exp(a^Tu+b)}
p=1+exp(aTu+b)exp(aTu+b)
%%
rng(729);
% data
a = 1;
b = -5;
m = 100;
u = 10*rand(m, 1);
y = rand(m, 1)<exp(a*u+b)./(1+exp(a*u+b)); % binary variables, 0 or 1
plot(u, y, 'o');
axis([-1, 11, -0.1, 1.1]);
%% cvx
U = [ones(m, 1) u];
% cvx_expert true enables the use of successive approximation methods to
% handle exponentials, logarithms and entropy
cvx_solver mosek
cvx_expert true
cvx_begin
variables x(2)
maximize (y'*U*x-sum(log_sum_exp([zeros(1, m); x'*U'])))
cvx_end
ind1 = find(y==1);
ind2 = find(y==0);
av = x(2);
bv = x(1);
us = linspace(-1, 11, 1000)';
ps = exp(av*us+bv)./(1+exp(av*us+bv));
hold on;
plot(us, ps, '-');
plot(u(ind1), y(ind1), 'o');
plot(u(ind2), y(ind2), 'o');
hold off;
Reference
Convex Optimization S.Boyd Page 340