Experimental browser for the Atmosphere
This #EMNLP2024 best paper aclanthology.org/2024.emnlp-m... had large gains over their (somewhat weak) baseline in trying to determine if a given document was in a LLMs pre-training data. Progress in an important problem.
Dec 20, 2024, 7:24 PM
{
"text": "This #EMNLP2024 best paper aclanthology.org/2024.emnlp-m... had large gains over their (somewhat weak) baseline in trying to determine if a given document was in a LLMs pre-training data. Progress in an important problem.",
"$type": "app.bsky.feed.post",
"embed": {
"$type": "app.bsky.embed.external",
"external": {
"uri": "https://aclanthology.org/2024.emnlp-main.300/",
"thumb": {
"$type": "blob",
"ref": {
"$link": "bafkreiglhrmfik7r4xsao63ysyvbt3wwmukrkfgzftxbmenmboae65vl7m"
},
"mimeType": "image/jpeg",
"size": 473200
},
"title": "Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method",
"description": "Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024."
}
},
"langs": [
"en"
],
"facets": [
{
"index": {
"byteEnd": 15,
"byteStart": 5
},
"features": [
{
"tag": "EMNLP2024",
"$type": "app.bsky.richtext.facet#tag"
}
]
},
{
"index": {
"byteEnd": 59,
"byteStart": 27
},
"features": [
{
"uri": "https://aclanthology.org/2024.emnlp-main.300/",
"$type": "app.bsky.richtext.facet#link"
}
]
}
],
"createdAt": "2024-12-20T19:24:08.880Z"
}