Tuesday 28 August 2018

Similarity test algorithm

A plagiarism checker algorithm has been developed to calculate an index that is proportional with the melodic similarity between samples. It considers melody, rhythm, harmony and some other factors as well. Some of these factors are calculated in a sophisticated way to result a more reasonable value.

Melodyc factor is considering same pitch, different pitch (decreasing effect), closely timed notes, intervals, repetitions, ...
Rhythm factor is considering note locations and for some extent rests as well.
Harmony factor is considering chord changes weighing by how usual/unusual they are.
The "others" factor is considering tempo, location (section, phrase, bar), similarity of instrumentation. More details only for those who interested.

The test is still under fine tuning.
Preliminary test results below, so the results may change a bit up or down. Keep in mind the proposed limit is around 8.0, that has to be handled carefully. Between 7.0 and 9.0 there is a "gray" range, but out of this the case is more or less black or white. 

1) Stay With Me vs. I Won't Back Down
Similarity index: 11,96

Melody: 9,23
Rhythm: 1,14
Harmony: 1,02
Others: 1,11

Clear case.

2) Blurred Lines vs. Got To Give It Up

2a) Blurred Lines vs. Got To Give It Up - "signature phrase"
Similarity index: 2,83

Melody: 2,8
Rhythm: 1,02
Harmony: 0,92
Others: 1,08

This result is maximized by triming off the non-matching notes. Without the trimming the entire phrase would result a negative value due to the too many different notes.

2b) Blurred Lines vs. Got To Give It Up - "hook"
Similarity index: 3,91
Melody: 3,45
Rhythm: 1,09
Harmony: 1,06
Others: 0,98

By far the highest result in the case. Only one perfect match, plus three close ones.

2c) Blurred Lines vs. Got To Give It Up - bass
Similarity index: 1,48

Melody: 1,23
Rhythm: 1,01
Harmony: 0,95
Others: 1,25

This result too is maximized by triming off the non-matching notes. Without the trimming the entire phrase would result a negative value due to the too many different notes.

2d) Blurred Lines vs. Got To Give It Up - 5 to 1 bass motif
Similarity index: 1,30

Melody: 1,72
Rhythm: 0,80
Harmony: 0,79
Others: 1,19


2e) Blurred Lines vs. Got To Give It Up - hey-hey-hey
Similarity index: 0,45

Melody: 0,70
Rhythm: 0,80
Harmony: 0,79
Others: 1,02

The lowest index the trimming couldnot save it either. It was also pointed out as a similar motif by musicologists and later testified as being substantially similar (with all other points).

2f) 
Blurred Lines vs. Got To Give It Up - "keep on dancin"
Similarity index: 0,93

Melody: 1,10
Rhythm: 1,00
Harmony: 0,91
Others: 0,93

Summary of the six Blurred Lines vs Got To Give It Up samples:
a:2,76
b:4,31
c:1,48
d:1,37
e:0,45
f:0,93

c to f are ranging from 0,55 to 1,43. We could just say "no comment", but it cries out
for a comment. These are ridicoulusly low values to label as substantially similar
or even just similar. Gayes-party expert in her testimony claimed each
of these being substantially similar - in the musicologic meaning of the word.

Also note that none of these patterns occure simultainously or subsequently.
Now think it over what percentage of randomly chosen (pop) songs contain
an at least 4,17 and a 2,76 strong melodic coincidence. See the blog article Accidental similarity.


3) Blurred Lines vs Another One Bites The Dust
Similarity index: 3,46
It's just a melismatic motif with nine (!) consecutive matching notes, that are following a commonplace pattern. The algorhythm effectively compensates the repeated commonplace motifs.


Melody: 4,6
Rhythm: 1,11
Harmony: 0,8
Others: 0,9


4) Sweet Child Of Mine vs. Unpublished Critics
Similarity index: 5,72

Melody: 4,1
Rhythm: 1,03
Harmony: 1,1
Others: 1,28

This refers only to the verse melodies. Similarly to 2a) the result would be much lower (a negative value) if the comparation would consider the entire phrase. For getting a higher result the non-matching motes were trimmed down from the melodic comparison. In this case there were other similar details as well.


5) Creep vs. Air That I Breathe
Similarity index: 9,14

Melody: 7,01
Rhythm: 1,13
Harmony: 1,15
Others: 1,00

The compared pattern in Creep is the falsetto sung melody after the "solo".


6) Get Free vs. Creep
Similarity index: 9,64 (depends on!)

Note that in this case the complaining melodies in Creep are different from those that are similar with the Air That I Breathe. The two cases are melody-wise independent from eachother.
The melodies in this case are just partly similar. Some phrases are rather different. The rough placement of the phrases is similar in both songs: starting 2-3 beats before the downbeat of the actual harmonic phrase (where the chords change).
We have two different verses in both songs. Slightly different in Get Free more
different in Creep (phrases 3 and 4). To maximize the matching notes I hade to take the closer variant of the verses which is the first verse in Creep.
The highest result was given by considering phrase 3-4 of verses through phrases 1-2 of chorus. This is a "cheat" in favour of Creep since these phrases are not subsequent with the chorus phrases. Without this cheat the index would not reach the proposed limit at 8.0!

Melody: 8,5
Rhythm: 0,87
Harmony: 1,18
Others: 1,10

7) Photograph vs. Amazing
Highest score is resulted by the first ABB sequence of phrases that shows a similarity index of: 10,90 according to the algorithm. The non-repeated phrases would result a lower value.

Melody: 9.03
Rhythm: 1.06
Harmony: 1.09
Others: 1.04


8) Come As You Are vs. Eighties
Similarity index: 12,64

Melody: 9,13
Rhythm: 1,03
Harmony: 1,08
Others: 1,24

13,81 considering the repetitions.

Clear case? Not quite! Just to mess things up:

Eighties (1985) vs. Life Goes On (1982)
Similarity index: 12,06 or 16,88 considering the repetitions.

Come As You Are vs. Life Goes On
Similarity index: 10,52
11,19 considering the repetitions.


9)
Love Is A Wonderful Thing (Isley Brothers) vs. Da Doo Ron Ron
Similarity index: 7,40
Under the limit.

Melody: 6,62
Rhythm: 1,06 The shuffle beat difference is considered in the "others" factor: 0,9.
Harmony: 1,02
Others: 1,04

10)
Love Is A Wonderful Thing (Michael Bolton)
vs.
Love Is A Wonderful Thing (IsleyBrothers)
Similarity index: 6,78

Melody: 3,90
Rhythm: 1,14
Harmony: 0,98
Others: 1,56 The four identic words alone contribute with a 1,2 gain.

This best result was by choosing the once-occuring title phrase variant in Bolton's song, next to the sax solo. The most frequently occuring Bolton variants resulted in an 3,82 index.

11)
Thinking Out Loud vs. Let's Get It On

11a)
The bass base loop.
Similarity index: 7,37

This a surprisingly high index for a four note melody. It is considering the looping with a 1,4 "gain". Since it is a commonplace motif even in prior art, it does not matter much.

Melody: 6,30
Rhythm: 1,10 (if the3+5 pattern would not be commonplace this factor would be higher)
Harmony: 1,04
Others: 1,80

11b)
TOL verse 1st phrase vs. LGIO chorus 3rd phrase
The opening notes, the title phrase in LGIO is a traditional fanfare motif. The compared fragment is a melismatic motif in LGIO: 3-4 notes only, since the rest is rather different.
Similarity index: 3,46

Melody: 3,85
Rhythm: 0,87
Harmony: 1,07
Others: 0,97

11c)
TOL verse 2st phrase vs. LGIO chorus 4rd phrase 
(the 3 5 6 5 3 motif)
Similarity index: 2,81

Melody: 2,9
Rhythm: 1,13
Harmony: 0,93
Others: 0,92

11d)
TOL verse with LGIO verse
Very different melodies. There is a two note fragment that is "similar".
Similarity index: 1,52

Melody: 1,49
Rhythm: 0,9
Harmony: 1,06
Others: 1,08

T12)
Walk vs. Nem Vagyok Tökéletes
Similarity index: 9,25

Melody: 8,36
Rhythm: 1,03
Harmony: 1,10
Others: 0,98

The "complaining" song is from aHungarian band. This one is an unprobable case of access, so it must be accidental in spite of the index being over the gray range. The calculation considers the repetition. Homekey is the same and the chords as well.

13)
Shape Of You vs. No Scrubs
Similarity index: 5,14

Melody: 5,33
Rhythm: 1,10
Harmony: 0,85
Others: 1,03

There is a similar passage indeed, but the strength of the similarity does not close even the "gray" range. It's above the "usual" level of Marvin Gaye cases tough...
See the Accidental similarity for an example of a melodic factor of 4,8 occuring accidently between two of the three songs chosen randomly.


14)
Thinking Out Loud vs. Forget You
Similarity index:7,11

Melody:7,40
Rhythm: 1,01
Harmony: 0,96
Others: 0,99

Close one on the low end of the "gray" range. Stronger similarity than that of Shape vs. Scrubs...

15)
Ice Ice Baby vs. Under Pressure
Similarity index: 12,20

Melody: 9,21
Rhythm: 1,15
Harmony: 1,06
Others: 1,08

This one was a case of sampling. The similarity works as if it would be a simple "rip-off".

16)
Firework vs. Always
Similarity index: 7,14

Melody: 7,9
Rhythm: 1,02
Harmony: 1,04
Others: 1,04

Some different notes are saving Fireworks.

17)
Stairway To Heaven vs. Taurus
Similarity index: 4,70

Melody: 3,50
Rhythm: 1,02
Harmony: 1,02
Others: 1,30

It's was a special test as some beats were playing two notes simultainously. Even without the consideration of commonplace motifs the melodic similarity is still under the "limit".
There are certainly many identic and close notes, but the different notes (for example the open B string notes of Taurus and the top notes of Stairway) are holding back the result.
Taurus has a "twin" song called Summer Rain that was recorded roughly in the same months. These two songs share 11 consequtive notes.

18)
Starboy vs. Yooho
Similarity index: 9,39

Melody: 6,3
Rhythm: 1,08
Harmony: 1,05
Others: 1,30

The melodyc similarity itself is not strong enough, but many other factors are amplifying it: BPM, chords, key, instrumentation, location,...
The best result is obtained by comparing the first phrases only.



No comments:

Post a Comment