STEM 1: Linguistic Prosody in Relation to Musical Chords and Syntax

My STEM project is focused on the intersections between music and language, specifically connections between contour, or direction, of the voice and melodic lines as well as distance between individual syllables and chords across different languages. The lit review on the STW page can provide more information about my project.


Music is an incredibly universal form of communication. Even without speaking the language a certain piece of music is recorded in, the emotion is easily understandable. This project looked into the possible connections between language and music, specifically focusing on change in melodic distance based on which pitch classes are present in music across a time interval and the change in perceived pitch of speech over time. Shona, Hmong, and English were chosen to be the main languages studied because of the unique ways each uses tone: Shona as a grammatical function, Hmong as a way of changing the meaning of a word, and English to indicate emotion or a question, but as less crucial a function of language. 8 music samples and 8 spoken samples were collected for each language studied, with pitch classes and Mel frequency coefficients being extracted from music and speech samples respectively using the Librosa library for Python. A possible tonal correlation between speech and music could provide evidence for a connection in brain processing of speech and music, and lead to the development of new music therapies for speech disorders such as aphasia.

Phrase 1

How does musical syntax relate to prosody and linguistic syntax at the syllable, word, and sentence level, and how does this relationship vary across tonal and non tonal languages?

Phrase 2

If different tonal languages are analyzed, musical syntax will bear relation to linguistic prosody and syntax in terms of tone, intervals, chords and structure because of the connection in how musical and linguistic syntax are processed in the brain.

Background Infographic

More info: Language and communication are fundamental cornerstones of culture, and there are countless ways people communicate, whether verbally or non-verbally. Speech, or the act of uttering words and phrases, can be identified as a major form of verbal communication. Another way to communicate through sound is using music, which also has the ability to convey emotion and tell stories and is just as crucial an outlet for expression (Han, 2011). However, the fact that both of these means of communication are carried out by sound is not the only similarity they share. A connection has been established between both the brain processing of language and music and the harmonics of music in comparison to the tonality of language (Patel, 2003; Schellenberg, 2009). However, this connection can differ across languages, and there are many specifics that have yet to be found, such as the exact brain location of processing for each and language-specific correlation between spoken tone and melody (Schellenberg, 2009). There is much opportunity for more research into this correlation across a wider variety of languages. Although it is true that there is much room for additional knowledge, past researchers have also made many advancements and discoveries which have helped establish this connection. Brain processing of music and speech has been found to be related through brain injuries resulting in parallel difficulty in understanding speech and music (Patel, 2003). Tonal and non-tonal language speakers have participated in studies in which pitch discernment and development of the brain processes relating to this ability have been studied, showing a superior ability in tonal language speakers to identify musical notes and just-learned tonal words (Best, 2019; Siddique, 2013). Past studies have provided a stable framework for analyzing pieces of music based on intervals between individual pitches and melodic segments as well as overall vocal contour (Han, 2011; Schellenberg, 2009), and a greater connection between tonal languages and music has been established. These past advancements lead to a greater understanding about the levels of connection between spoken language and music, from brain processing to harmonics. However, there is a need for more research across a wider variety of tonal languages because of the vast differences between how languages use tone and the unique traditional music that originates across various places. There is also a need for analysis based on elements of musical syntax, especially how a rhythmic framework influences pitch perception (Lerdahl, 1988). Current studies take into account pitches of individual notes, yet sometimes fail to look at those pitches within a wider framework (Schellenberg, 2009). Identifying under what rhythmic conditions there is tonal and musical correlation for a language would widely expand current knowledge.

Procedure Infographic

More info: 8 musical samples and spoken samples were collected from each language to be studied. Shona, Hmong, and English were chosen as the primary languages to be tested because of the differing ways they use tone- English is a stress language, Shona is a pitch accent language, and Hmong is a contour tone language. Files were collected using YouTube to MP3 as well as the San Diego Hmong Language Project, an online database which has audio samples of songs and speech in the Hmong language. Online talks and news broadcasts were a vital resource for speech analysis, and the cut selections were listened through to ensure there was no background music or laughter. The files were stored as wav, mp3, or Audacity projects, with Audacity being used to convert between mp3 and wav. The songs were then broken down into 30 second samples, chosen randomly from the middle of the recording. For the analysis of musical samples, a Python library called Librosa was used. A filtered chromagram and a semitone spectrogram were graphed for each song. Then, each song was run through with a segment of code that obtained a chromagram value for individual blocks of time. To compute distance between the 12 pitch classes detailed in the chromagram, the distance formula was used. Librosa was also used for analysis of the vocal samples. Similarly, the program ran through each spoken audio sample and returned frequency values for different time increments. Distance was calculated using subtraction.  


Figure 1 Figure 2 Figure 3 Figure 4



Discussion and Conclusion

Potential sources of error include the range of time over which the samples were taken: while all samples are from the twenty-first century, English samples are all from the past two years while Shona and Hmong samples were more distributed over the past ten or even twenty years. Another potential source of error could be the quality of the audio recording- the English samples were all well-known, successful pop songs from major labels that may have been recorded with higher quality. Some significant error with the vocal samples can be explained by the fact that one podcast episode for both Shona and Hmong was recorded into a computer while being played off of a phone, and when these samples are ignored, the results change significantly in favor of the hypothesis. All other samples were recorded by being downloaded straight from the Internet, without the possibility for error in a transition stage. Both of the samples that were recorded into a computer microphone have a substantial spike at the end unlike any of the other samples taken for that language. Additional error in the vocal samples could come from the fact that naturally, people have different speaking voices and speak at different pitches. This was attempted to be accounted for by the fact that only distance between frequencies was looked at, not the individual frequencies themselves. However, it would be more beneficial if these changes were looked at in proportion to how large they are compared to the overall speaking range of the person talking as a whole. This could also be improved by taking a much wider range of samples, or studying only one person’s speaking voice. With the current methods, however, it would be very time consuming to collect many samples. A possible extension for far in the future could be the development of a program which can automatically collect and analyze samples from the Internet, creating an expanding database of analysis. If correlation between music and vocal samples of the same language is proven, this is strong foundation to suggest further study is needed. This study accounts for equal time intervals of random location over both musical and spoken samples, which is a good baseline, but future research should undoubtedly take into account the rhythmic framework and structure of a musical piece. Code was written in Python to take this into account, but with a few bugs in the code, this is still currently being worked out. Additionally, in a larger study, it would be beneficial to vary the time period the audio sample was taken from and compare correlation and variance across these groups.


Here are all of my references for my STEM project.

February Fair Board

You may notice the layout is similar to that of my Physics lab board, that’s because I used the same basic template for both. Here is all of the information I had on my board at the STEM fair in February, which basically summarized all of the work I did on my project up until February.

December Fair Board

You may notice the layout is similar to that of my Physics lab board, that’s because I used the same basic template for both. Here is all of the information I had on my board at the STEM fair in December, which basically summarized all of the work I did on my project up until December.

Project Proposal

This was the first major assignment we had related to our STEM project. The layout of the document included most of the questions we had to answer when submitting to SEFOS for approval. SEFOS basically just makes sure all of the projects are safe and don’t have harmful procedures.