Thursday, 19 September 2013

How to get a,b & c parameters in a Zipf distribution of rank/frequency

How to get a,b & c parameters in a Zipf distribution of rank/frequency

I'm trying to approximate the number of ocurrences of a given word with
rank n in a text corpus. I know that rank/frequency follows a power law.
I have sorted all the words (so their position is their rank, 1st word is
the one that most appears and so on). What I am trying to fit is the
following version of Zipf law for rank/frequency in words:
Fr = c / (r+b)^a
Where r is the rank of a given word.
I've been using some R scripts that I've found and I managed to elaborate
a log-log plot so I see that my data follows in fact a power law (as the
log-log plot is lineal, and according to the theory from top-left to
right-bottom). But I have no clue as to how get the c, b & a values. I
have also the best quadratic fit but I'm not sure what this means (is it
"a" parameter value?) this is what R reported:
$minimum
[1] 1.134048
$objective
[1] 15756.57
And I was told that this 1.13 is the best quadratic fit but I don't know
how to translate this to what I'm looking for.

No comments:

Post a Comment