精品素人自拍偷拍|91精品国产av国产|杨思敏伦理片|91制片厂杨柳信息|亚洲激情综合|蜜桃影像传媒ios下载|亚洲精品视频在线看|打屁股色网站|爱豆文化传媒影片|国产欧美精品一区二区色,明星换脸 av,国产日韩成人av,亚洲成a人影院

 
 
 
文章檢索
首頁(yè)» 過刊瀏覽» 2024» Vol.9» lssue(5) 750-763???? DOI : 10.3969/j.issn.2096-1693.2024.05.057
最新目錄| | 過刊瀏覽| 高級(jí)檢索
融合旋轉(zhuǎn)位置編碼與掩碼條件隨機(jī)場(chǎng)的鉆井工程命名實(shí)體智能識(shí)別方法
曹倩雯,李維,林伯韜*,,金衍,,韓雪銀,張家豪
1 中國(guó)石油大學(xué)( 北京) 人工智能學(xué)院,北京 102249 2 中國(guó)石油大學(xué)( 北京) 石油工程學(xué)院,北京 102249 3 中海油能源發(fā)展有限公司工程技術(shù)分公司,天津 300452
Intelligent named entities recognition for drilling engineering by integrating rotational position embedding and masked conditional random fields
CAO Qianwen, LI Wei, LIN Botao, JIN Yan, HAN Xueyin, ZHANG Jiahao
1 College of Artificial Intelligence, China University of Petroleum- Beijing, Beijing 102249, China 2 College of Petroleum Engineering, China University of Petroleum- Beijing, Beijing 102249, China 3 China National Offshore Oil Corporation Energy Development Co., Ltd. Engineering Technology Branch, Tianjin 300452, China

全文: ? HTML (1 KB)?
文章導(dǎo)讀??
摘要? 鉆井工程報(bào)告記錄了油氣藏的地質(zhì)信息以及鉆井工程的參數(shù),,自動(dòng)提取報(bào)告中的非結(jié)構(gòu)化信息能夠顯著 提高數(shù)據(jù)入湖的效率,從而實(shí)現(xiàn)高效數(shù)據(jù)管理,。然而,,這類報(bào)告通常具有特定領(lǐng)域的特征,且結(jié)構(gòu)和語言的多樣性給命名實(shí)體的準(zhǔn)確識(shí)別帶來了諸多挑戰(zhàn),。目前,,命名實(shí)體識(shí)別常用的深度神經(jīng)網(wǎng)絡(luò)模型通?;谛∫?guī)模標(biāo)注數(shù)據(jù)集進(jìn)行訓(xùn)練或微調(diào),導(dǎo)致兩方面問題,。首先,,缺乏大規(guī)模的標(biāo)注語料庫(kù),限制了訓(xùn)練樣本的多樣性,,進(jìn)而導(dǎo)致模型在面對(duì)新數(shù)據(jù)或未見過的數(shù)據(jù)時(shí)表現(xiàn)不佳,,降低了模型在不同類型數(shù)據(jù)上的泛化能力。其次,,現(xiàn)有模型缺乏針對(duì)長(zhǎng)距離上下文的文本建模能力,,由于相關(guān)實(shí)體可能分散在鉆井工程報(bào)告內(nèi)較長(zhǎng)的文本段落中,這類方法難以有效捕獲和識(shí)別復(fù)雜文檔中命名實(shí)體的關(guān)系,。為了解決上述問題,,本文提出了一種融合旋轉(zhuǎn)位置編碼和掩碼條件隨機(jī)場(chǎng)的鉆井工程命名實(shí)體智能識(shí)別方法。該方法基Transformer編碼器,、雙向長(zhǎng)短期記憶網(wǎng)絡(luò)(BiLSTM)和條件隨機(jī)場(chǎng)(CRF)架構(gòu),。Transformer編碼器利用預(yù)訓(xùn)練語言模型提供豐富的上下文語義表示,BiLSTM捕捉序列依賴性,,而CRF則用于序列標(biāo)注,。此外,通過設(shè)計(jì)掩碼建模機(jī)制改進(jìn)了傳統(tǒng)的CRF,,限制了倒置序列的生成,,提高了序列標(biāo)注次序的一致性。旋轉(zhuǎn)位置編碼的集成進(jìn)一步增強(qiáng)了模型對(duì)文本中相對(duì)位置信息的感知,,促進(jìn)模型捕捉遠(yuǎn)距離單詞之間的依賴關(guān)系,,從而提高識(shí)別跨越較大上下文范圍的命名實(shí)體的能力。除了模型改進(jìn)之外,,本文還通過構(gòu)建領(lǐng)域特定的命名實(shí)體語料庫(kù)來解決訓(xùn)練數(shù)據(jù)不足的問題,。該語料庫(kù)包括12類實(shí)體的標(biāo)注,覆蓋了共20 727 個(gè)實(shí)體標(biāo)簽,,分布于4 000 個(gè)文本段落中,,為模型提供了更多樣化的訓(xùn)練樣本,,幫助提高模型的泛化能力,。實(shí)驗(yàn)結(jié)果表明,本文提出的模型在測(cè)試集上的F1 值為86.49,,相較于之前的最優(yōu)模型提高了2.65,,在長(zhǎng)尾分布的實(shí)體識(shí)別上的性能也顯著提高。該方法不僅擴(kuò)展了命名實(shí)體識(shí)別在鉆井工程中的應(yīng)用,,還能夠?yàn)楣こ處熖峁└咝У男畔⑻崛」ぞ?,加速鉆井?dāng)?shù)據(jù)的分析,,提高鉆井操作管理的效率,并增強(qiáng)數(shù)據(jù)入湖的效率,,從而對(duì)鉆井項(xiàng)目的決策過程帶來積極影響,。
服務(wù)
把本文推薦給朋友
加入我的書架
加入引用管理器
關(guān)鍵詞 : 命名實(shí)體識(shí)別,鉆井工程,Transformer編碼器,自然語言處理,深度學(xué)習(xí)
Abstract

Drilling engineering reports record geological information about oil and gas reservoirs as well as various drilling engineering parameters. The automatic extraction of unstructured information from these reports can significantly improve the efficiency of data integration into data lakes, thereby enabling more efficient data management. However, these reports typically have domain-specific characteristics, and the diversity of their structure and language presents considerable challenges for accurate named entity recognition (NER). Currently, deep neural network models commonly used for NER are typically trained or fine-tuned on small-scale annotated datasets, leading to two main issues. First, the lack of large-scale annotated corpora limits the diversity of training samples, which in turn causes poor performance when the model encounters new or unseen data, decreasing the model’s generalization ability across different types of data. Second, existing models lack the ability to effectively model long-distance contextual information in texts. Since relevant entities may be scattered across long text segments in drilling engineering reports, these methods often struggle to capture and recognize relationships between named entities in complex documents. To address the aforementioned issues, this paper proposes an intelligent method for named entity recognition in drilling engineering that integrates rotational position embedding and masked conditional random fields. The proposed method is based on a Transformer encoder, a bidirectional long short-term memory network (BiLSTM), and a conditional random field (CRF) architecture. The Transformer encoder leverages pre-trained language models to provide rich contextual semantic representations, BiLSTM captures sequential dependencies, and CRF is used for sequence labeling. Moreover, the traditional CRF is improved by designing a masked modeling mechanism, which restricts the generation of inverted sequences, thereby enhancing the consistency of sequence labeling order. The integration of rotational position embedding further enhances the model's awareness of relative positional information in the text, allowing the model to better capture dependencies between distant words. This improves the model's ability to recognize named entities spread across larger contextual ranges. In addition to model improvements, this paper also addresses the issue of insufficient training data by constructing a domain-specific named entity corpus. This corpus includes annotations for 12 categories of entities, covering a total of 20,727 entity labels across 4,000 text segments. This enriched dataset provides more diverse training samples, which helps improve the model's generalization ability. Experimental results show that the proposed model achieves an F1 score of 86.49 on the test set, representing an improvement of 2.65 percentage points over the previous best-performing model. Furthermore, the model demonstrates significant improvements in recognizing entities with long-tail distributions, which are often underrepresented in typical training datasets. This method not only expands the application of named entity recognition in the field of drilling engineering but also provides engineers with an efficient tool for extracting critical information. By accelerating the analysis of drilling data, it improves the efficiency of drilling operations management and enhances data lake integration, ultimately bringing positive impacts to the decision-making process in drilling projects.


Key words: named entity recognition; drilling engineering; transformer encoder; natural language processing; deep learning
收稿日期: 2024-10-31 ????
PACS: ? ?
基金資助:國(guó)家自然基金項(xiàng)目(No. 62402526) 和中國(guó)石油大學(xué)( 北京) 科研啟動(dòng)基金項(xiàng)目(2462024BJRC013) 聯(lián)合資助
通訊作者: [email protected]
引用本文: ??
曹倩雯, 李維, 林伯韜, 金衍, 韓雪銀, 張家豪. 融合旋轉(zhuǎn)位置編碼與掩碼條件隨機(jī)場(chǎng)的鉆井工程命名實(shí)體智能識(shí)別方法. 石油科 學(xué)通報(bào), 2024, 09(05): 750-763 CAO Qianwen, LI Wei, LIN Botao, JIN Yan, HAN Xueyin, ZHANG Jiahao1. Intelligent named entities recognition for drilling engineering by integrating rotational position embedding and masked conditional random fields. Petroleum Science Bulletin, 2024, 09(05): 750-763.
鏈接本文: ?
版權(quán)所有 2016 《石油科學(xué)通報(bào)》雜志社