这里是一些学习CV时整理的一些问题集,可能有助于复习等。由于该笔记是很早以前制作的,暂时不做修改。
CV Question
- what’s machine vision?
Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance in industry.
Input: image, video, Output: inspection and analysis
Goal: give computers super human-level perception
- Typical perception channel
Representation -> ‘fancy math’ -> output, and representation and output are the parts we are most interested in.
- Common Applications
Automated visual inspection, object recognition, face detection, face makeovers, vision in cars. image stitching, virtual fitting, vr, kinect fusion, 3D reconstruction.
- Subject connection
Image processing: digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing.
Computer Graphics: Computer graphics is the discipline of generating images with the aid of computers.
Pattern Recognition: Pattern recognition is the automated recognition of patterns and regularities in data.
Computer Vision: Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos.
Difference between Computer Vision adn Machine Vision: Computer vision refers to automation of the capture and processing of images, with an emphasis on image analysis. In other words, CV’s goal is not only to see, but also to process and provide useful results based on the observation. Machine vision refers to the use of computer vision in industrial environments, making it a subcategory of computer vision.
Artificial intelligence: Computer science defines AI research as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.[1] A more elaborate definition characterizes AI as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation.”
- Vision Process
- Feature extraction and region segmentation. (Low)
- Modeling and Schema Representation (Midem)
- Describe and understand (high)
- Difficulties faced by Machine Vision
- Image ambiguity: When 3D sense is projected as a 2D image, the info of the depth and invisible parts is lost. Therefore, 3D objects of different shapes projected on the image plane may produce the same image.
- Environment Factors: Factors in the scene such as lighting, objects shapes, surface colors, cameras, and the changes in spatial relationships, etc.
- Knowledge guidance: Under different knowledge guidance, the same image will produce different recognition results.
- Large amounts of data: Gray image, color image, and depth image have a huge amount of information.The huge amount of data requires a large storage space, and it is not easy to process quickly.
- Human Vision System
Physical structure: HVS is composed of optical system, retina, visual pathway.
TODO: I don’t want to learn HVS knowledge first, so I skip it. If I have extra time, the remain knowledge will be made up.
-
Key tech in Computer Vision System
- Image process( Smooth denoising, Standardization, Missing/Outlier Value Process )
- Image feature extraction( Shape, Texture, Color, Spatial Relations )
- Image Recognition( GoogLeNet, ResNet… )
-
Image formation
The randomness of the Imaging Process and the complexity of the imaging object determine the nature of the image with a random signal.
An image bascially consists of:
- Illumination component i ( x , y ) i(x, y) i(x,y)
- Reflection component r ( x , y ) r(x, y) r(x,y)
So, The 2D function representation of the Image:
f
(
x
,
y
)
=
i
(
x
,
y
)
∗
r
(
x
,
y
)
f(x, y) = i(x, y) * r(x, y)
f(x,y)=i(x,y)∗r(x,y)
- Human eye brightness perception range
Total Range: 1 0 − 2 − 1 0 6 10^{-2} -10^6 10−2−106, so Contrast c = B m a x / B m i n = 1 0 8 c = B_{max} / B_{min} = 10^8 c=Bmax/Bmin=108, and the Relative Contrast c r = 100 % × ( B − B 0 ) / B 0 c_r = 100\% \times (B - B_0) / B_0 cr=100%×(B−B0)/B0 where B 0 B_0 B0 means background brightness and B B B means the object brightness.
Relationship between subjective brightness S and actual brightness B:
S
=
K
ln
B
+
K
0
S = K \ln{B}+ K_0
S=KlnB+K0
- Brightness adaptability
Visually sensitive is contrast, not the brightness value itself.
Weber theorem:
If the brightness of an object differs from the surrounding background
I
I
I (their ratio is a function). It is approximately constant within a certain range of brightness, with a constant value of 0.02, which is called the Weber ratio.
Δ
I
I
=
0.02
\frac{\Delta I}{I} = 0.02
IΔI=0.02
Mach Effect: The visual system is less sensitive to spatial high and low frequencies, while it is more sensitive to spatial intermediate frequencies.Therefore, a brightness overshoot occurs at a sudden change in brightness. This overshoot can enhance the outline of the scene seen by the human eye.
- Color imaging model
Light energy itself is colorless. Color is a physiological and psychological phenomenon that occurs when people’s eyes perceive light.
Lightwave: Light is an electromagnetic wave that radiates according to its wavelength.
Young–Helmholtz theory(trichromatic theory): the three types of cone photoreceptors could be classified as short-preferring (violet), middle-preferring (green), and long-preferring (red).
- Color property
Hue: the degree to which a stimulus can be described as similar to or different from stimuli that are described as red, green, blue, and yellow.
Saturation: colorfulness of an area judged in proportion to its brightness.
Intensity: Refers to the degree of light and darkness that the human eye feels due to color stimuli.
Grassman Laws:
First law: Two colored lights appear different if they differ in either dominant wavelength, luminance or purity. Corollary: For every colored light there exists a light with a complementary color such that a mixture of both lights either desaturates the more intense component or gives uncolored (grey/white) light. Second law: The appearance of a mixture of light made from two components changes if either component changes. Corollary: A mixture of two colored lights that are non-complementary result in a mixture that varies in hue with relative intensities of each light and in saturation according to the distance between the hues of each light. Third law: There are lights with different spectral power distributions but appear identical. First corollary: such identical appearing lights must have identical effects when added to a mixture of light. Second corollary: such identical appearing lights must have identical effects when subtracted (i.e., filtered) from a mixture of light. Fourth law: The intensity of a mixture of lights is the sum of the intensities of the components.
- Color
The result of interaction between physical light in the environment and our visual system
- Color Space
- Linear color space
- RGB color space
- HSV color space
- CIE XYZ
- White Balance
White balance (WB) is the process of removing unrealistic color casts, so that objects which appear white in person are rendered white in your photo.
Color temperature describes the spectrum of light which is radiated from a “blackbody” with that surface temperature.
Von Kries adaptation:
- Multiply each channel by a gain factor
- A more general transformation would correspond to an arbitrary 3x3 matrix
Best way: gray card:
- Take a picture of a neutral object
- Deduce the weight of each channel
Brightest pixel assumption (non-staurated)
- Highlights usually have the color of the light source
- Use weights inversely proportional to the values of the brightest pixels
Gamutmapping
- Gamut: convex hull of all pixel colors in an image
- Find the transformation that matches the gamut of the image to the gamut of a “typical” image under white light
- Mathematical representation of an image
Optical radiation power of wavelength