Experimental browser for the Atmosphere
Could you explain more why KTO or other unpaired methods wouldn't have similar issues with off-policy data? If the data is off-policy, my expectation would be that the users unpaired ratings would often change since the likelihood of possible alternatives has changed.
Dec 19, 2024, 6:54 PM
{
"text": "Could you explain more why KTO or other unpaired methods wouldn't have similar issues with off-policy data?\n\nIf the data is off-policy, my expectation would be that the users unpaired ratings would often change since the likelihood of possible alternatives has changed.",
"$type": "app.bsky.feed.post",
"langs": [
"en"
],
"reply": {
"root": {
"cid": "bafyreic44c5fbtlfsjkjjhhxvg7sdvjkk556vruetrh4amdmebp2orffzq",
"uri": "at://did:plc:brkj2yocng7vtggmyujy4khq/app.bsky.feed.post/3ldjl7torno2t"
},
"parent": {
"cid": "bafyreigmb6tun6pmpaakcsdc7cau5kmgncxzl4kghso6ozs3ixlgbtl2mi",
"uri": "at://did:plc:j7tmwpecoad43t6jhp5t5ovn/app.bsky.feed.post/3ldohfhgjfc2b"
}
},
"createdAt": "2024-12-19T18:54:48.210Z"
}