2021 ICM
Problem D: The Influence of Music
Music has been part of human societies since the beginning of time as an essential component of cultural heritage. As part of an effort to understand the role music has played in the collective human experience, we have been asked to develop a method to quantify musical evolution. There are many factors that can influence artists when they create a new piece of music, including their innate ingenuity, current social or political events, access to new instruments or tools, or other personal experiences. Our goal is to understand and measure the influence of previously produced music on new music and musical artists.
自古以来,音乐就已成为人类社会的一部分,已成为文化遗产的重要组成部分。 为了理解音乐在人类集体经验中所扮演的角色,我们被要求开发一种量化音乐发展的方法。 在创作新音乐时,有许多因素会影响艺术家,包括其天赋的创造力,当前的社会或政治事件,使用新的乐器或工具或其他个人经历。 我们的目标是了解和衡量先前制作的音乐对新音乐和音乐艺术家的影响。
- 我们量化音乐发展的方法是:社会网络分析、社区发现算法、社会网络中的信息传播模型
- 了解和衡量先前制作的音乐对新音乐和音乐艺术家的影响:对之前的音乐和音乐家建立特征评价体系模型(主成分分析:降维、机器学习:层次聚类)来进行分析流派内和流派间音乐家的区别和流派内和流派间的特征、通过LSTM 长短时间记忆网络来衡量时间上流派的演变
- 通过音乐家社会网络中的顶点的度和子网规模大小衡量单一音乐家的音乐影响力,之前的音乐家的音乐影响力造成更多相似音乐特征的后代作品!
Some artists can list a dozen or more other artists who they say influenced their own musical work. It has also been suggested that influence can be measured by the degree of similarity between song characteristics, such as structure, rhythm, or lyrics. There are sometimes revolutionary shifts in music, offering new sounds or tempos, such as when a new genre emerges, or there is a reinvention of an existing genre (e.g. classical, pop/rock, jazz, etc.). This can be due to a sequence of small changes, a cooperative effort of artists, a series of influential artists, or a shift within society.
一些艺术家可以列出十几个或更多他们认为对自己的音乐作品有影响的艺术家。 还建议可以通过歌曲特征(例如结构,节奏或歌词)之间的相似程度来衡量影响力。 音乐有时会发生革命性变化,提供新的声音或节奏,例如何时出现新的流派,或者对现有流派(例如古典,流行/摇滚,爵士等)进行重新发明。 这可能是由于一系列小变化,艺术家的合作努力,一系列有影响力的艺术家或社会内部的变化所致。
- 在子网络中,可以发现网络中心节点的音乐影响力更大,同时子网络的整体音乐流派都和中心节点高度一致,网络的歌曲特征的相似度高,网络节点直接相互的音乐影响力较大!
- 分析音乐发展网络的子网中心节点的艺术家的特征:天赋很高、年代特殊、对流派发展的影响广泛深刻
Many songs have similar sounds, and many artists have contributed to major shifts in a musical genre. Sometimes these shifts are due to one artist influencing another. Sometimes it is a change that emerges in response to external events (such as major world events or technological advances). By considering networks of songs and their musical characteristics, we can begin to capture the influence that musical artists have on each other. And, perhaps, we can also gain a better understanding of how music evolves through societies over time.
许多歌曲具有相似的声音,许多艺术家为音乐类型的重大转变做出了贡献。 有时,这些变化是由于一位艺术家影响了另一位艺术家。 有时,这种变化是对外部事件(例如重大世界事件或技术进步)的响应而出现的。 通过考虑歌曲的网络及其音乐特征,我们可以开始捕捉音乐艺术家之间的相互影响。 而且,也许,我们还可以更好地了解音乐随着时间的流逝在整个社会中的发展。
- 外部事件、音乐流派的演变、音乐艺术家之间的影响、音乐对社会的影响(文献资料的整理)
Your team has been identified by the Integrative Collective Music (ICM) Society to develop a model that measures musical influence. This problem asks you to examine evolutionary and revolutionary trends of artists and genres. To do this, your team has been given several data sets by the ICM:
您的团队已被集成集体音乐(ICM)协会认可,以开发一种衡量音乐影响力的模型。 这个问题要求您检查艺术家和流派的进化和革命趋势。 为此,ICM为您的团队提供了一些数据集:
-
画重点:衡量音乐影响力的模型、艺术家和流派的演化趋势!(论文最终的结论!)
结合最后一问:音乐网络知识图谱的建立,能从整体上描述事件和社会环境(社会事件、政治因素、互联网因素)对音乐的影响和音乐的演化趋势!
- “influence_data” 1 represents musical influencers and followers, as reported by the artists themselves, as well as the opinions of industry experts. These data contains influencers and followers for 5,854 artists in the last 90 years.
1)“ influence_data ” 代表音乐影响者和追随者,由艺术家本人以及行业专家的意见所报告。 这些数据包含过去90年中5,854位艺术家的影响者和关注者。
- “full_music_data”2 provides 16 variable entries, including musical features such as danceability, tempo, loudness, and key, along with artist_name and artist_id for each of 98,340 songs. These data are used to create two summary data sets, including:
a. mean values by artist “data_by_artist”,
b. means across years “data_by_year”.
2)“ full_music_data ” 提供16个变量条目,包括音乐特征,例如舞蹈性,节奏,响度和键,以及每个artist_name 和 artist_id ,98,340首歌曲。 这些数据用于创建两个摘要数据集,包括:
a. 艺术家“ data_by_artist ”的值
b. 表示跨年份“ data_by_year ”
Note: DATA provided in these files are a subset of larger data sets. These files CONTAIN THE ONLY DATA YOU SHOULD USE FOR THIS PROBLEM.
To carry out this challenging project, the ICM Society asks your teams to explore the evolution of music through the influence across musical artists over time, by doing the following:
-
Use the influence_data data set or portions of it to create a (multiple) directed network(s) of musical influence, where influencers are connected to followers. Develop parameters that capture ‘music influence’ in this network. Explore a subset of musical influence by creating a subnetwork of your directed influencer network. Describe this subnetwork. What do your ‘music influence’ measures reveal in this subnetwork?
使用influence_data数据集或其一部分创建一个(多个)有音乐影响力的定向网络,将影响者连接到追随者。 开发可捕捉此网络中“ 音乐影响力” 的参数。 通过创建定向影响者网络的子网来探索音乐影响力的子集。 描述此子网。 您的“音乐影响力”措施在此子网络中体现了什么?
-
使用社交网络分析模型创建一个(多个)有音乐影响力的定向网络
社会网络分析模型!
第一个网络是一个无权有向图,节点是影响者和追随者,由影响者指向追随者。
-
开发可捕捉此网络中“ 音乐影响力” 的参数
通过社区发现算法来探索音乐传播影响力的参数,可以识别出连接紧密的子网, 音乐影响力我们认为是图的节点的度和所属子网络规模大小的加权量化平均值。
-
通过创建定向影响者网络的子网来探索音乐影响力的子集。 描述此子网。 您的“音乐影响力”措施在此子网络中体现了什么?
经过社区发现算法之后,可以发现整体的音乐影响社会网络中,识别出了16个子网,可以发现网络中心节点的音乐影响力更大,同时子网络的整体音乐流派都和中心节点高度一致。
-
-
Use full_music_data and/or the two summary data sets (with artists and years) of music characteristics, to develop measures of music similarity. Using your measure, are artists within genre more similar than artists between genres?
使用full_music_data 和/或音乐特征的两个摘要数据集(包括艺术家和年份)来制定音乐相似度的度量。 使用您的度量,流派的艺术家是否比流派的艺术家更相似?
大部分流派内的艺术家比流派间的艺术家更相似
- 主成分分析——降维
- 音乐的特征指标(抽象概念:节奏性、情感因素、流行性...)
- 聚类分析结果(层次聚类)
- 得出结果(流派内的艺术家相似性更高)
-
Compare similarities and influences between and within genres. What distinguishes a genre and how do genres change over time? Are some genres related to others?
比较流派之间和流派内的相似性和影响。 什么是流派的区别(可视化!),流派如何随时间变化(可视化!)? 有些类型与其他类型有关吗?
- 流派之间越高,流派之内越低的聚类模型指标
- 流派和时间的时间序列模型()
- 流派影响模型
- 该问题需要对流派进行分析, 通过之前 (1) 中影响者和追随者之间的关联, 就 可以推出流派间的动态演化过程, 比如 A流派产生演化出B流派又演化出 CD 流派. 通过对之前各个流派的数据分析就可以知道在流派演化的过程中它的哪些特性发生了具体的变化, 流派的区别是什么, 进行详细的阐述及大量的数据分析可视化才是这题的重点.
-
Indicate whether the similarity data, as reported in the data_influence data set, suggest that the identified influencers in fact influence the respective artists. Do the ‘influencers’ actually affect the music created by the followers? Are some music characteristics more ‘contagious’ than others, or do they all have similar roles in influencing a particular artist’s music?
指明在data_influence数据集中报告的相似性数据是否表明所识别的影响者实际上在影响相应的艺术家。 “影响者”实际上会影响追随者创作的音乐吗? 是某些音乐特征比其他音乐特征更具“感染力”,或者它们在影响特定艺术家的音乐方面起着相似的作用?
-
根据社会传播模型(节点重要性理论、节点中心性理论)的建立,以及社区发现算法结果和真实艺术家流派的对比。
结论是所识别的影响者实际上在影响相应的艺术家,某节点影响力越大其与周围节点的流派相似性更高,高影响力影响者对与追随者创作的音乐具有更大的音乐相似度。
-
通过对不同影响力级别的音乐艺术家的音乐特征的量化分析
结论是发现某些音乐特征更具感染力,在影响更多的音乐家向大流派趋同。
-
-
Identify if there are characteristics that might signify revolutions (major leaps) in musical evolution from these data?
What artists represent revolutionaries (influencers of major change) in your network?
从这些数据中确定是否存在可能标志着音乐发展中的革命(重大飞跃)的特征? 在您的网络中,哪些艺术家代表着革命者(重大变革的影响者)?
-
建立LSTM 长短时间记忆网络,可视化年份间音乐的特征变化(图);年份间流派流行度的变化(图)
结论是:instrumentalness 下降、60年代POP&Rock异常繁荣且对后面的音乐流派流行度影响深远
-
艺术家影响力排名(表)
结论是影响力最高的艺术家几乎都是60年代POP&Rock流派的音乐家
-
结合20世纪60年代的美国社会影响(合理性解释)
-
-
Analyze the influence processes of musical evolution that occurred over time in one genre.
Can your team identify indicators that reveal the dynamic influencers, and explain how the genre(s) or artist(s) changed over time?
分析一种类型流派随时间变化的影响过程。(传播过程的影响力如何导致某一种流派的音乐特征随时间的变化?) 您的团队能否确定能揭示动态影响者的指标,并解释流派或艺术家随时间的变化?
-
可视化某一流派的流行度变化(图)
-
建立模糊综合评价法模型
求解艺术家的综合音乐发展(进化)影响度:艺术家传播影响力+艺术家的流行度+艺术家的歌曲数目
-
结论是流派中艺术家的综合音乐发展(进化)影响度越高,流派流行度也在变高。
-
-
How does your work express information about cultural influence of music in time or circumstances? Alternatively, how can the effects of social, political or technological changes (such as the internet) be identified within the network?
您的作品如何表达有关音乐在时间或环境方面的文化影响的信息? 或者,如何在网络中识别社会,政治或技术变化(例如互联网)的影响?
- 引入知识图谱理论,知识图谱是一种融合了更多特征信息的社会网络综合分析。
- 社会文化背景和历史对音乐发展的影响(结合60年代对表达内心的诉求)
- 科技和互联网等技术对传播的影响(涉及的音乐的制作技术和传播方式的变化、古典音乐更倾向于音乐会的方式)
-
模型优化及灵敏度分析
- 社会网络分析(复杂网络模型)、社区发现算法
- 主成分分析、层次聚类
- 多元逻辑回归
- 社会网络传播模型
- LSTM长短时间神经循环网络
- 模糊综合评价法模型
- 知识图谱
-
摘要
Write a one-page document to the ICM Society about the value of using your approach to understanding the influence of music through networks. Considering the two problem data sets were limited to only some genres, and subsequently to those artists common to both data sets, how would your work or solutions change with more or richer data? Recommend further study of music and its effect on culture.
向ICM协会写一页的文件,内容涉及使用您的方法通过网络理解音乐影响的价值。 考虑到这两个问题数据集仅限于某些类型,然后又针对这两个数据集共有的艺术家,您的作品或解决方案将如何随着更多或更丰富的数据而发生变化? 建议进一步研究音乐及其对文化的影响。
The ICM Society, an interdisciplinary and diverse group from the fields of music, history, social science, technology, and mathematics, looks forward to your final report.
Your PDF solution of no more than 25 total pages should include:
- One-page Summary Sheet.
- Table of Contents.
- Your complete solution.
- One-page document to ICM society.
- References list.
Note: New for 2021! The ICM Contest now has a 25-page limit. All aspects of your submission count toward the 25-page limit: Summary Sheet, Table of Contents, Main Body of Solution, Images and Tables, One-page Document, Reference List, and any Appendices.
Attachments
We provide the following four data files for this problem. THE DATA FILES PROVIDED CONTAIN THE ONLY DATA YOU SHOULD USE FOR THIS PROBLEM.
-
influence_data.csv
-
full_music_data.csv
-
data_by_artist.csv
-
data_by_year.csv
Data Descriptions
- influence_data.csv
(Data is encoded in utf-8 to allow for handling of special characters):
-
influencer_id: A unique identification number given to the person listed as influencer.(string of digits)
Influencer_id:给被列为影响者的人的唯一识别号码。
-
influencer_name: The name of the influencing artist as given by the follower or industry experts. (string)
Influencer_name:追随者或行业专家给出的具有影响力的艺术家的名称。
-
influencer_main_genre: The genre that best describes the bulk of the music produced by the influencing artist. (if available) (string)
Influencer_main_genre:最能描述有影响力的艺术家所创作的大部分音乐的类型。
-
influencer_active_start: The decade that the influencing artist began their music career.(integer)
Influencer_active_start:有影响力的歌手开始音乐生涯的十年。
-
follower_id: A unique identification number given to the artist listed as follower. (string of digits)
Follower_id:给被列为跟随者的艺术家的唯一标识。
-
follower_name: The name of the artist following an influencing artist. (string)
Follower_id:给被列为跟随者的艺术家的唯一标识。
-
follower_main_genre: The genre that best describes the bulk of the music produced by the following artist. (if available) (string)
Follower_main_genre:最能描述追随的艺术家创作的大部分音乐的流派。
-
follower_active_start: The decade that the following artist began their music career.(integer)
Follower_active_start:追随艺术家开始他们音乐生涯的十年。
-
full_music_data.csv
-
data_by_artist.csv
-
data_by_year.csv
Spotify audio features from the “full_music_data”, “data_by_artist”, “data_by_year”:
-
artist_name: The artist who performed the track. (array)
演唱这首歌的艺人
-
artist_id: The same unique identification number given in the influence_data.csv file.(string of digits)
在influence_data.csv文件中给出的相同的唯一识别号码。
Characteristics of the music:
-
danceability: A measure of how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. (float)
舞蹈性:根据节奏、节奏稳定性、拍子力度和整体规律等音乐元素的组合来衡量一首曲子是否适合跳舞。0.0是最不适合跳舞的值,1.0是最适合跳舞的值。
-
energy: A measure representing a perception of intensity and activity. A value of 0.0 is least intense/energetic and 1.0 is most intense/energetic. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. (float)
能量:表示强度和活力的量度。0.0是强度最小的值,1.0是强度最大的值。通常,充满能量的轨道给人的感觉是快速、响亮的。和吵闹。例如,死亡金属的能量很高,而巴赫的序曲在音阶上得分较低。对这一属性有贡献的感知特征包括动态范围、感知到的响度、音色、发作率和一般熵。
-
valence: A measure describing the musical positiveness conveyed by a track. A value of 0.0 is most negative and 1.0 is most positive. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). (float)
效价:描述音轨所传达的乐感的一种量度。0.0是最负的值,1.0是最正的值。高效价的音轨听起来更积极(如快乐、快乐、欣快),而低效价的音轨听起来更消极(如悲伤、抑郁、愤怒)。
-
tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. (float)
节奏:以每分钟节拍(BPM)计算的曲目的总体估计速度。在音乐貂皮学中,节奏是一个给定的作品的速度或节奏,直接来源于平均拍子持续时间。
-
loudness: The overall loudness of a track in decibels (dB). Values typical range between -60 and 0 db. Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). (float)
响度:音轨的整体响度,单位为分贝(dB)。数值的典型范围在-60到0分贝之间。响度值平均横跨整个轨道,是有用的比较轨道的相对响度。响度是声音的性质,是与身体力量(振幅)的主要心理关联。
-
mode: An indication of modality (major or minor), the type of scale from which its melodic content is derived, of a track. Major is represented by 1 and minor is 0.
调式:音轨对调式(大调或小调)的指示,调式是音阶的一种类型,它的旋律内容来源于此。大调用1表示,小调用0表示。
-
key: The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value for key is -1. (integer)
键:估计的整个轨道的关键。整数使用标准音高类表示法映射到音高。例如:0-C, 1- c# /Db, 2= D,等等。如果未检测到键值,则键值为-1。
Type of vocals:
-
acousticness: A confidence measure of whether the track is acoustic (without technology enhancements or electrical amplification). A value of 1.0 represents high confidence the track is acoustic. (float)
声学:一种确定赛道是否为声学(没有技术增强或电子放大)的置信度。1.0代表音轨的高度置信度。
-
instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. (float)
器乐性:预测音轨是否不包含人声。“Ooh”和“aah”的声音在这个语境中被视为有用的。说唱或口语的音轨是明显的“声音”。工具性值越接近1.0,这首歌就越有可能不包含声乐内容。高于0.5的值表示工具性轨迹,但当值接近1.0时,置信度更高。
-
liveness: Detects the presence of an audience in a track. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. (float)
活跃度:检测轨道中是否有观众。较高的活度值表示实时执行跟踪的可能性增加。如果该值高于0.8,则表明该轨道很有可能是实时的。
-
speechiness: Detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. (float)
语音:检测语音在音轨中是否存在。越是像语音一样的录音(例如脱口秀、有声读物、诗歌),属性值越接近1.0。高于0.66的值表示可能完全由口语单词组成的音轨。0.33到0.66之间的值描述了既包含音乐又包含语言的音轨,可以是分段的,也可以是分层的,包括说唱音乐这样的情况。低于0.33的值很可能代表音乐和其他非语言类音轨。
-
explicit: Detects explicit lyrics in a track (true (1) = yes it does; false (0) = no it does not OR unknown). (Boolean)
显式:检测音轨中明确的歌词(true (1) = yes;false(0) =不存在或未知)。
Description:
-
duration_ms: The duration of the track in milliseconds. (integer)
磁道持续时间,单位为毫秒。
-
popularity: The popularity of the track. The value will be between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played more frequently now will have a higher popularity than songs that were played more frequently in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity are derived mathematically from track popularity. (integer)
这条音轨的受欢迎程度。该值将在0到100之间,其中100是最常用的。流行度是通过算法来计算的,在很大程度上是基于这首歌的播放总数以及这些播放的最近时间。一般来说,现在播放频率更高的歌曲会比过去播放频率更高的歌曲更受欢迎。重复的音轨(例如,来自一个单曲和专辑的相同的音轨)是独立评级的。艺术家和专辑的流行度是由歌曲的流行度计算出来的。
-
year: The year of release of a track. (integer from 1921 to 2020)
单曲发行的年份。
-
release_date: The calendar date of release of a track mostly in yyyy-mm-dd format, however precision of date may vary and some just given as yyyy.
音轨发布的日历日期,大多是yyyy-mm-dd格式,但日期的精确度可能会有所不同,有些只给出yyyy。
-
song_title (censored): The name of the track. (string) Software was run to remove any potential explicit words in the song title.
音轨的名称。(字符串)软件被运行,以删除任何潜在的显式文字在歌曲的标题。
-
count: The number of songs a particular artist is represented in the full_music_data.csv file.(integer)
在完整的音乐数据文件中表示某个艺术家的full_music_data.csv