精品素人自拍偷拍|91精品国产av国产|杨思敏伦理片|91制片厂杨柳信息|亚洲激情综合|蜜桃影像传媒ios下载|亚洲精品视频在线看|打屁股色网站|爱豆文化传媒影片|国产欧美精品一区二区色,明星换脸 av,国产日韩成人av,亚洲成a人影院

 
 
 
文章檢索
首頁» 過刊瀏覽» 2024» Vol.9» lssue(5) 750-763???? DOI : 10.3969/j.issn.2096-1693.2024.05.057
最新目錄| | 過刊瀏覽| 高級檢索
融合旋轉(zhuǎn)位置編碼與掩碼條件隨機場的鉆井工程命名實體智能識別方法
曹倩雯,,李維,林伯韜*,,金衍,,韓雪銀,張家豪
1 中國石油大學( 北京) 人工智能學院,,北京 102249 2 中國石油大學( 北京) 石油工程學院,,北京 102249 3 中海油能源發(fā)展有限公司工程技術分公司,天津 300452
Intelligent named entities recognition for drilling engineering by integrating rotational position embedding and masked conditional random fields
CAO Qianwen, LI Wei, LIN Botao, JIN Yan, HAN Xueyin, ZHANG Jiahao
1 College of Artificial Intelligence, China University of Petroleum- Beijing, Beijing 102249, China 2 College of Petroleum Engineering, China University of Petroleum- Beijing, Beijing 102249, China 3 China National Offshore Oil Corporation Energy Development Co., Ltd. Engineering Technology Branch, Tianjin 300452, China

全文: ? HTML (1 KB)?
文章導讀??
摘要? 鉆井工程報告記錄了油氣藏的地質(zhì)信息以及鉆井工程的參數(shù),,自動提取報告中的非結(jié)構(gòu)化信息能夠顯著 提高數(shù)據(jù)入湖的效率,,從而實現(xiàn)高效數(shù)據(jù)管理。然而,,這類報告通常具有特定領域的特征,,且結(jié)構(gòu)和語言的多樣性給命名實體的準確識別帶來了諸多挑戰(zhàn)。目前,,命名實體識別常用的深度神經(jīng)網(wǎng)絡模型通?;谛∫?guī)模標注數(shù)據(jù)集進行訓練或微調(diào),,導致兩方面問題。首先,,缺乏大規(guī)模的標注語料庫,,限制了訓練樣本的多樣性,進而導致模型在面對新數(shù)據(jù)或未見過的數(shù)據(jù)時表現(xiàn)不佳,,降低了模型在不同類型數(shù)據(jù)上的泛化能力,。其次,現(xiàn)有模型缺乏針對長距離上下文的文本建模能力,,由于相關實體可能分散在鉆井工程報告內(nèi)較長的文本段落中,,這類方法難以有效捕獲和識別復雜文檔中命名實體的關系。為了解決上述問題,,本文提出了一種融合旋轉(zhuǎn)位置編碼和掩碼條件隨機場的鉆井工程命名實體智能識別方法,。該方法基Transformer編碼器、雙向長短期記憶網(wǎng)絡(BiLSTM)和條件隨機場(CRF)架構(gòu),。Transformer編碼器利用預訓練語言模型提供豐富的上下文語義表示,,BiLSTM捕捉序列依賴性,而CRF則用于序列標注,。此外,,通過設計掩碼建模機制改進了傳統(tǒng)的CRF,限制了倒置序列的生成,,提高了序列標注次序的一致性,。旋轉(zhuǎn)位置編碼的集成進一步增強了模型對文本中相對位置信息的感知,促進模型捕捉遠距離單詞之間的依賴關系,,從而提高識別跨越較大上下文范圍的命名實體的能力,。除了模型改進之外,本文還通過構(gòu)建領域特定的命名實體語料庫來解決訓練數(shù)據(jù)不足的問題,。該語料庫包括12類實體的標注,,覆蓋了共20 727 個實體標簽,分布于4 000 個文本段落中,,為模型提供了更多樣化的訓練樣本,,幫助提高模型的泛化能力。實驗結(jié)果表明,,本文提出的模型在測試集上的F1 值為86.49,,相較于之前的最優(yōu)模型提高了2.65,在長尾分布的實體識別上的性能也顯著提高,。該方法不僅擴展了命名實體識別在鉆井工程中的應用,,還能夠為工程師提供高效的信息提取工具,加速鉆井數(shù)據(jù)的分析,,提高鉆井操作管理的效率,,并增強數(shù)據(jù)入湖的效率,,從而對鉆井項目的決策過程帶來積極影響。
服務
把本文推薦給朋友
加入我的書架
加入引用管理器
關鍵詞 : 命名實體識別,鉆井工程,Transformer編碼器,自然語言處理,深度學習
Abstract

Drilling engineering reports record geological information about oil and gas reservoirs as well as various drilling engineering parameters. The automatic extraction of unstructured information from these reports can significantly improve the efficiency of data integration into data lakes, thereby enabling more efficient data management. However, these reports typically have domain-specific characteristics, and the diversity of their structure and language presents considerable challenges for accurate named entity recognition (NER). Currently, deep neural network models commonly used for NER are typically trained or fine-tuned on small-scale annotated datasets, leading to two main issues. First, the lack of large-scale annotated corpora limits the diversity of training samples, which in turn causes poor performance when the model encounters new or unseen data, decreasing the model’s generalization ability across different types of data. Second, existing models lack the ability to effectively model long-distance contextual information in texts. Since relevant entities may be scattered across long text segments in drilling engineering reports, these methods often struggle to capture and recognize relationships between named entities in complex documents. To address the aforementioned issues, this paper proposes an intelligent method for named entity recognition in drilling engineering that integrates rotational position embedding and masked conditional random fields. The proposed method is based on a Transformer encoder, a bidirectional long short-term memory network (BiLSTM), and a conditional random field (CRF) architecture. The Transformer encoder leverages pre-trained language models to provide rich contextual semantic representations, BiLSTM captures sequential dependencies, and CRF is used for sequence labeling. Moreover, the traditional CRF is improved by designing a masked modeling mechanism, which restricts the generation of inverted sequences, thereby enhancing the consistency of sequence labeling order. The integration of rotational position embedding further enhances the model's awareness of relative positional information in the text, allowing the model to better capture dependencies between distant words. This improves the model's ability to recognize named entities spread across larger contextual ranges. In addition to model improvements, this paper also addresses the issue of insufficient training data by constructing a domain-specific named entity corpus. This corpus includes annotations for 12 categories of entities, covering a total of 20,727 entity labels across 4,000 text segments. This enriched dataset provides more diverse training samples, which helps improve the model's generalization ability. Experimental results show that the proposed model achieves an F1 score of 86.49 on the test set, representing an improvement of 2.65 percentage points over the previous best-performing model. Furthermore, the model demonstrates significant improvements in recognizing entities with long-tail distributions, which are often underrepresented in typical training datasets. This method not only expands the application of named entity recognition in the field of drilling engineering but also provides engineers with an efficient tool for extracting critical information. By accelerating the analysis of drilling data, it improves the efficiency of drilling operations management and enhances data lake integration, ultimately bringing positive impacts to the decision-making process in drilling projects.


Key words: named entity recognition; drilling engineering; transformer encoder; natural language processing; deep learning
收稿日期: 2024-10-31 ????
PACS: ? ?
基金資助:國家自然基金項目(No. 62402526) 和中國石油大學( 北京) 科研啟動基金項目(2462024BJRC013) 聯(lián)合資助
通訊作者: [email protected]
引用本文: ??
曹倩雯, 李維, 林伯韜, 金衍, 韓雪銀, 張家豪. 融合旋轉(zhuǎn)位置編碼與掩碼條件隨機場的鉆井工程命名實體智能識別方法. 石油科 學通報, 2024, 09(05): 750-763 CAO Qianwen, LI Wei, LIN Botao, JIN Yan, HAN Xueyin, ZHANG Jiahao1. Intelligent named entities recognition for drilling engineering by integrating rotational position embedding and masked conditional random fields. Petroleum Science Bulletin, 2024, 09(05): 750-763.
鏈接本文: ?
版權所有 2016 《石油科學通報》雜志社