跳转至

Labridge

Source analyze

Labridge

主页
功能模块
功能模块
- Papers
  Papers
  - Shared papers
    
    Shared papers
    
    文献内容解析与提取
    
    共享文献库构建
    
    共享文献库检索
  - Personal recent papers
    
    Personal recent papers
    
    个人近期文献库结构
    
    个人临时文献库检索
    
    Download
    Download
    
    在arXiv上检索与下载文献
- Chat history
  Chat history
  - 短期记忆
  - Long-term history
    Long-term history
    
    交互日志存储结构
    
    交互日志检索
- Experiment logs
  Experiment logs
  - Personal experiment logs
    
    Personal experiment logs
    
    个人实验日志存储结构
    
    个人实验日志检索
  - Shared experiment logs
    
    Shared experiment logs
- Instruments
  Instruments
  - Store
  - Retrieve
- References
  References
  - 参考仪器文档
  - 参考文献
Agent与可用工具
Agent与可用工具
- Agent提示词框架
- Tools
  Tools
  - Base
    Base
    
    各种Tools的基类
    
    工具调用日志
  - Chat history
    Chat history
    
    SharedPaperRetrieverTool
  - Experiment log
    Experiment log
    
    CreateNewExperimentLogTool
    
    ExperimentLogRetrieveTool
    
    RecordExperimentLogTool
    
    SetCurrentExperimentTool
  - Interact
    Interact
    
    CollectAndAuthorizeTool
  - Shared papers
    Shared papers
    
    SharedPaperRetrieverTool
  - Temporary papers
    Temporary papers
    
    AddNewRecentPaperTool
    
    ArXivSearchDownloadTool
    
    RecentPaperRetrieveTool
    
    RecentPaperSummarizeTool
项目部署
项目部署
用户界面
用户界面
- Server-Client
- APP
- Web UI
应用展示
应用展示
- Paper
  Paper
- Experiment log
  Experiment log
  - 实验日志记录
  - 实验日志 QA
- Instrument
  Instrument
  - 仪器信息 QA
  - Instrument operations
- Developer mode
  Developer mode
  - 在 Acting phase 评论
  - 在 reasoning phase 指导思考
源码文档
源码文档
- Accounts
  Accounts
  - Super users
  - Users
- Agent
  Agent
  - Chat agent
  - Chat Msg
    Chat Msg
    
    Msg types
  - ReAct
    ReAct
    
    Prompt
    
    React
    
    React chat format
    
    React step
- Callback
  Callback
  - Base
    Base
    
    Operation base
    
    Operation log
  - Experiment_log
    Experiment_log
    
    New experiment
    
    Set current experiment
  - Paper
    Paper
    
    Add recent paper
    
    Paper download
    
    Paper summarize
- Common
  Common
  - Prompt
    Prompt
    
    Llm doc choice select
  - Query_engine
    Query_engine
    
    Query engines
  - Utils
    Utils
    
    Chat
    
    Time
- Func_modules
  Func_modules
  - Instrument
    Instrument
    
    Prompt
    Prompt
    
    Llm instrument choice select
    
    Retrieve
    Retrieve
    
    Instrument retriever
    
    Store
    Store
    
    Instrument store
  - Memory
    Memory
    
    Base
    
    Chat
    Chat
    
    Chat memory
    
    Retrieve
    
    Short memory
    
    Experiment
    Experiment
    
    Experiment log
    
    Retrieve log
  - Paper
    Paper
    
    Download
    Download
    
    Arxiv
    
    Async utils
    
    Parse
    Parse
    
    Paper reader
    
    Extractors
    Extractors
    
    Metadata extract
    
    Source analyze Source analyze
    目录
    
    source_analyze
    
    PaperSourceAnalyzer
    
    analyze_source
    
    keyword_analyze
    
    llm_analyze
    
    reader_analyze
    
    Parsers
    Parsers
    
    Auto
    
    Base
    
    Default parser
    
    Ieee parser
    
    Nature parser
    
    Prompt
    Prompt
    
    Store
    Store
    
    Dir summary
    
    Synthesize
    Synthesize
    
    Paper summarize
    
    Synthesize
    
    Retrieve
    Retrieve
    
    Paper retriever
    
    Shared paper retrieve
    
    Temporary paper retriever
    
    Store
    Store
    
    Paper store
    
    Shared paper store
    
    Temporary store
    
    Synthesizer
    Synthesizer
    
    Summarize
  - Reference
    Reference
    
    Base
    
    Instrument
    
    Paper
- Interact
  Interact
  - Authorize
    Authorize
    
    Authorize
  - Collect
    Collect
    
    Collector
    Collector
    
    Common collector
    
    Select collector
    
    Manager
    Manager
    
    Collect manager
    
    Types
    Types
    
    Common info
    
    Info base
    
    Select info
    
    Pipeline
    
    Utils
- Interface
  Interface
  - Http server
  - Utils
- Models
  Models
  - Local
    Local
    
    Mindspore models
  - Remote
    Remote
    
    Remote models
    
    Remote server
- Tools
  Tools
  - Base
    Base
    
    Function base tools
    
    Tool base
    
    Tool log
  - Common
    Common
    
    Date time
  - Instrument
    Instrument
    
    Retrieve
  - Interact
    Interact
    
    Collect and authorize
  - Memory
    Memory
    
    Chat
    Chat
    
    Retrieve
    
    Experiment
    Experiment
    
    Insert
    
    Retrieve
  - Paper
    Paper
    
    Download
    Download
    
    Arxiv download
    
    Shared_papers
    Shared_papers
    
    Query
    
    Retriever
    
    Utils
    
    Temporary_papers
    Temporary_papers
    
    Insert
    
    Paper retriever
    
    Paper summarize
  - Utils

Source analyze

`labridge.func_modules.paper.parse.extractors.source_analyze` ¶

`labridge.func_modules.paper.parse.extractors.source_analyze.PaperSourceAnalyzer` ¶

This class analyze the source of the paper, such as 'Nature', 'IEEE'.

In default, the source analysis bases on keyword occurrence count. Also, LLM can be used to help analyzing the source.

PARAMETER	DESCRIPTION
`llm`	The used LLM. TYPE: `LLM` DEFAULT: `None`
`service_context`	The service context. TYPE: `ServiceContext` DEFAULT: `None`
`keyword_count_threshold`	A PaperSource is selected as a candidate only if its corresponding keyword occurrence count exceed this threshold. TYPE: `int` DEFAULT: `10`

Source code in labridge\func_modules\paper\parse\extractors\source_analyze.py

class PaperSourceAnalyzer:
	r"""
	This class analyze the source of the paper, such as 'Nature', 'IEEE'.

	In default, the source analysis bases on keyword occurrence count.
	Also, LLM can be used to help analyzing the source.

	Args:
		llm (LLM): The used LLM.
		service_context (ServiceContext): The service context.
		keyword_count_threshold (int): A PaperSource is selected as a candidate
			only if its corresponding keyword occurrence count exceed this threshold.
	"""
	def __init__(
		self,
		llm: LLM = None,
		service_context: ServiceContext = None,
		keyword_count_threshold: int = 10,
	):
		self.llm = llm or llm_from_settings_or_context(Settings, service_context)
		self.keyword_count_threshold = keyword_count_threshold

	def reader_analyze(self, paper_path: Union[Path, str]) -> PaperSource:
		"""
		Analyze the paper source using a structured pdf reader.

		Args:
			paper_path (Union[Path, str]): The paper path.

		Returns:
			PaperSource: The paper source.
		"""
		import PyPDF2

		with open(paper_path, 'rb') as file:
			fileReader = PyPDF2.PdfReader(file)
			file_info = fileReader.trailer['/Info']

		source = None
		if '/Subject' in file_info.keys():
			src_string = file_info['/Subject']
			if len(src_string) >= len(PaperSource.NATURE):
				source = PaperSource.IEEE
				for start in range(len(src_string) - len(PaperSource.NATURE) + 1):
					if src_string[start: start + len(PaperSource.NATURE)].upper() == PaperSource.NATURE.upper():
						source = PaperSource.NATURE
		return source

	def llm_analyze(self, paper_path: Union[Path, str]) -> PaperSource:
		""" TODO: using llm. """
		return PaperSource.DEFAULT

	def keyword_analyze(self, paper_path: Union[Path, str]) -> PaperSource:
		r"""
		Analyze the paper source based on keyword occurrence count.

		Args:
			paper_path (Union[Path, str]): The paper path.

		Returns:
			PaperSource: The analyzed paper source.
		"""
		import pymupdf
		import re

		doc = pymupdf.open(paper_path)
		pages = [page.get_text() for page in doc]

		""" Searching in the text."""
		source = None
		count = 0
		for page_text in pages:
			for t in re.findall(r"\w+", page_text):
				if t.strip().upper() == PaperSource.NATURE.upper():
					count += 1
		if count > self.keyword_count_threshold:
			source = PaperSource.NATURE
		else:
			source = PaperSource.IEEE
		return source

	def analyze_source(self, paper_path: Union[Path, str], use_llm = False) -> PaperSource:
		r"""
		Sequentially use `reader_analyze`, `keyword_analyze`, and `llm_analyze` to analyze the paper source

		Args:
			paper_path (Union[Path, str]): The paper path.
			use_llm (bool): Whether to use `llm_analyze`.

		Returns:
			PaperSource
		"""
		source = self.reader_analyze(paper_path)
		if source is None:
			source = self.keyword_analyze(paper_path)
		if source is None and use_llm:
			source = self.llm_analyze(paper_path)
		if source is None:
			source = PaperSource.DEFAULT
		return source

`labridge.func_modules.paper.parse.extractors.source_analyze.PaperSourceAnalyzer.analyze_source(paper_path, use_llm=False)` ¶

Sequentially use reader_analyze, keyword_analyze, and llm_analyze to analyze the paper source

PARAMETER	DESCRIPTION
`paper_path`	The paper path. TYPE: `Union[Path, str]`
`use_llm`	Whether to use `llm_analyze`. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`PaperSource`	PaperSource

Source code in labridge\func_modules\paper\parse\extractors\source_analyze.py

def analyze_source(self, paper_path: Union[Path, str], use_llm = False) -> PaperSource:
	r"""
	Sequentially use `reader_analyze`, `keyword_analyze`, and `llm_analyze` to analyze the paper source

	Args:
		paper_path (Union[Path, str]): The paper path.
		use_llm (bool): Whether to use `llm_analyze`.

	Returns:
		PaperSource
	"""
	source = self.reader_analyze(paper_path)
	if source is None:
		source = self.keyword_analyze(paper_path)
	if source is None and use_llm:
		source = self.llm_analyze(paper_path)
	if source is None:
		source = PaperSource.DEFAULT
	return source

`labridge.func_modules.paper.parse.extractors.source_analyze.PaperSourceAnalyzer.keyword_analyze(paper_path)` ¶

Analyze the paper source based on keyword occurrence count.

PARAMETER	DESCRIPTION
`paper_path`	The paper path. TYPE: `Union[Path, str]`

RETURNS	DESCRIPTION
`PaperSource`	The analyzed paper source. TYPE: `PaperSource`

Source code in labridge\func_modules\paper\parse\extractors\source_analyze.py

def keyword_analyze(self, paper_path: Union[Path, str]) -> PaperSource:
	r"""
	Analyze the paper source based on keyword occurrence count.

	Args:
		paper_path (Union[Path, str]): The paper path.

	Returns:
		PaperSource: The analyzed paper source.
	"""
	import pymupdf
	import re

	doc = pymupdf.open(paper_path)
	pages = [page.get_text() for page in doc]

	""" Searching in the text."""
	source = None
	count = 0
	for page_text in pages:
		for t in re.findall(r"\w+", page_text):
			if t.strip().upper() == PaperSource.NATURE.upper():
				count += 1
	if count > self.keyword_count_threshold:
		source = PaperSource.NATURE
	else:
		source = PaperSource.IEEE
	return source

`labridge.func_modules.paper.parse.extractors.source_analyze.PaperSourceAnalyzer.llm_analyze(paper_path)` ¶

Source code in labridge\func_modules\paper\parse\extractors\source_analyze.py

def llm_analyze(self, paper_path: Union[Path, str]) -> PaperSource:
	""" TODO: using llm. """
	return PaperSource.DEFAULT

`labridge.func_modules.paper.parse.extractors.source_analyze.PaperSourceAnalyzer.reader_analyze(paper_path)` ¶

Analyze the paper source using a structured pdf reader.

PARAMETER	DESCRIPTION
`paper_path`	The paper path. TYPE: `Union[Path, str]`

RETURNS	DESCRIPTION
`PaperSource`	The paper source. TYPE: `PaperSource`

Source code in labridge\func_modules\paper\parse\extractors\source_analyze.py

def reader_analyze(self, paper_path: Union[Path, str]) -> PaperSource:
	"""
	Analyze the paper source using a structured pdf reader.

	Args:
		paper_path (Union[Path, str]): The paper path.

	Returns:
		PaperSource: The paper source.
	"""
	import PyPDF2

	with open(paper_path, 'rb') as file:
		fileReader = PyPDF2.PdfReader(file)
		file_info = fileReader.trailer['/Info']

	source = None
	if '/Subject' in file_info.keys():
		src_string = file_info['/Subject']
		if len(src_string) >= len(PaperSource.NATURE):
			source = PaperSource.IEEE
			for start in range(len(src_string) - len(PaperSource.NATURE) + 1):
				if src_string[start: start + len(PaperSource.NATURE)].upper() == PaperSource.NATURE.upper():
					source = PaperSource.NATURE
	return source