当前位置: 首页 > news >正文

scrapy 教程

Scrapy Tutorial¶

In this tutorial, we’ll assume that Scrapy is already installed on your system. If that’s not the case, see Installation guide.

We are going to scrape quotes.toscrape.com, a website that lists quotes from famous authors.

This tutorial will walk you through these tasks:

  1. Creating a new Scrapy project

  2. Writing a spider to crawl a site and extract data

  3. Exporting the scraped data using the command line

  4. Changing spider to recursively follow links

  5. Using spider arguments

Scrapy is written in Python. The more you learn about Python, the more you can get out of Scrapy.

If you’re already familiar with other languages and want to learn Python quickly, the Python Tutorial is a good resource.

If you’re new to programming and want to start with Python, the following books may be useful to you:

  • Automate the Boring Stuff With Python

  • How To Think Like a Computer Scientist

  • Learn Python 3 The Hard Way

You can also take a look at this list of Python resources for non-programmers, as well as the suggested resources in the learnpython-subreddit.

Creating a project¶

Before you start scraping, you will have to set up a new Scrapy project. Enter a directory where you’d like to store your code and run:

scrapy startproject tutorial

This will create a tutorial directory with the following contents:

tutorial/scrapy.cfg            # deploy configuration filetutorial/             # project's Python module, you'll import your code from here__init__.pyitems.py          # project items definition filemiddlewares.py    # project middlewares filepipelines.py      # project pipelines filesettings.py       # project settings filespiders/          # a directory where you'll later put your spiders__init__.py

Our first Spider¶

Spiders are classes that you define and that Scrapy uses to scrape information from a website (or a group of websites). They must subclass Spider and define the initial requests to be made, and optionally, how to follow links in pages and parse the downloaded page content to extract data.

This is the code for our first Spider. Save it in a file named quotes_spider.py under the tutorial/spiders directory in your project:

from pathlib import Pathimport scrapyclass QuotesSpider(scrapy.Spider):name = "quotes"def start_requests(self):urls = ["https://quotes.toscrape.com/page/1/","https://quotes.toscrape.com/page/2/",]for url in urls:yield scrapy.Request(url=url, callback=self.parse)def parse(self, response):page = response.url.split("/")[-2]filename = f"quotes-{page}.html"Path(filename).write_bytes(response.body)self.log(f"Saved file {filename}")

As you can see, our Spider subclasses scrapy.Spider and defines some attributes and methods:

  • name: identifies the Spider. It must be unique within a project, that is, you can’t set the same name for different Spiders.

  • start_requests(): must return an iterable of Requests (you can return a list of requests or write a generator function) which the Spider will begin to crawl from. Subsequent requests will be generated successively from these initial requests.

  • parse(): a method that will be called to handle the response downloaded for each of the requests made. The response parameter is an instance of TextResponse that holds the page content and has further helpful methods to handle it.

    The parse() method usually parses the response, extracting the scraped data as dicts and also finding new URLs to follow and creating new requests (Request) from them.

How to run our spider¶

To put our spider to work, go to the project’s top level directory and run:

scrapy crawl quotes

This command runs the spider named quotes that we’ve just added, that will send some requests for the quotes.toscrape.com domain. You will get an output similar to this:

... (omitted for brevity)
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Spider opened
2016-12-16 21:24:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-16 21:24:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/1/> (referer: None)
2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/2/> (referer: None)
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-1.html
2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-2.html
2016-12-16 21:24:05 [scrapy.core.engine] INFO: Closing spider (finished)
...

Now, check the files in the current directory. You should notice that two new files have been created: quotes-1.html and quotes-2.html, with the content for the respective URLs, as our parse method instructs.

Note

If you are wondering why we haven’t parsed the HTML yet, hold on, we will cover that soon.

What just happened under the hood?¶

Scrapy schedules the scrapy.Request objects returned by the start_requests method of the Spider. Upon receiving a response for each one, it instantiates Response objects and calls the callback method associated with the request (in this case, the parse method) passing the response as an argument.

A shortcut to the start_requests method¶

Instead of implementing a start_requests() method that generates scrapy.Request objects from URLs, you can just define a start_urls class attribute with a list of URLs. This list will then be used by the default implementation of start_requests() to create the initial requests for your spider.

from pathlib import Pathimport scrapyclass QuotesSpider(scrapy.Spider):name = "quotes"start_urls = ["https://quotes.toscrape.com/page/1/","https://quotes.toscrape.com/page/2/",]def parse(self, response):page = response.url.split("/")[-2]filename = f"quotes-{page}.html"Path(filename).write_bytes(response.body)

The parse() method will be called to handle each of the requests for those URLs, even though we haven’t explicitly told Scrapy to do so. This happens because parse() is Scrapy’s default callback method, which is called for requests without an explicitly assigned callback.

Extracting data¶

The best way to learn how to extract data with Scrapy is trying selectors using the Scrapy shell. Run:

scrapy shell 'https://quotes.toscrape.com/page/1/'

Note

Remember to always enclose URLs in quotes when running Scrapy shell from the command line, otherwise URLs containing arguments (i.e. & character) will not work.

On Windows, use double quotes instead:

scrapy shell "https://quotes.toscrape.com/page/1/"

You will see something like:

[ ... Scrapy log here ... ]
2016-09-19 12:09:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://quotes.toscrape.com/page/1/> (referer: None)
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7fa91d888c90>
[s]   item       {}
[s]   request    <GET https://quotes.toscrape.com/page/1/>
[s]   response   <200 https://quotes.toscrape.com/page/1/>
[s]   settings   <scrapy.settings.Settings object at 0x7fa91d888c10>
[s]   spider     <DefaultSpider 'default' at 0x7fa91c8af990>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser

Using the shell, you can try selecting elements using CSS with the response object:

>>> response.css("title")
[<Selector query='descendant-or-self::title' data='<title>Quotes to Scrape</title>'>]

The result of running response.css('title') is a list-like object called SelectorList, which represents a list of Selector objects that wrap around XML/HTML elements and allow you to run further queries to refine the selection or extract the data.

To extract the text from the title above, you can do:

>>> response.css("title::text").getall()
['Quotes to Scrape']

There are two things to note here: one is that we’ve added ::text to the CSS query, to mean we want to select only the text elements directly inside <title> element. If we don’t specify ::text, we’d get the full title element, including its tags:

>>> response.css("title").getall()
['<title>Quotes to Scrape</title>']

The other thing is that the result of calling .getall() is a list: it is possible that a selector returns more than one result, so we extract them all. When you know you just want the first result, as in this case, you can do:

>>> response.css("title::text").get()
'Quotes to Scrape'

As an alternative, you could’ve written:

>>> response.css("title::text")[0].get()
'Quotes to Scrape'

Accessing an index on a SelectorList instance will raise an IndexError exception if there are no results:

>>> response.css("noelement")[0].get()
Traceback (most recent call last):
...
IndexError: list index out of range

You might want to use .get() directly on the SelectorList instance instead, which returns None if there are no results:

>>> response.css("noelement").get()

There’s a lesson here: for most scraping code, you want it to be resilient to errors due to things not being found on a page, so that even if some parts fail to be scraped, you can at least get some data.

Besides the getall() and get() methods, you can also use the re() method to extract using regular expressions:

>>> response.css("title::text").re(r"Quotes.*")
['Quotes to Scrape']
>>> response.css("title::text").re(r"Q\w+")
['Quotes']
>>> response.css("title::text").re(r"(\w+) to (\w+)")
['Quotes', 'Scrape']

In order to find the proper CSS selectors to use, you might find it useful to open the response page from the shell in your web browser using view(response). You can use your browser’s developer tools to inspect the HTML and come up with a selector (see Using your browser’s Developer Tools for scraping).

Selector Gadget is also a nice tool to quickly find CSS selector for visually selected elements, which works in many browsers.

XPath: a brief intro¶

Besides CSS, Scrapy selectors also support using XPath expressions:

>>> response.xpath("//title")
[<Selector query='//title' data='<title>Quotes to Scrape</title>'>]
>>> response.xpath("//title/text()").get()
'Quotes to Scrape'

XPath expressions are very powerful, and are the foundation of Scrapy Selectors. In fact, CSS selectors are converted to XPath under-the-hood. You can see that if you read the text representation of the selector objects in the shell closely.

While perhaps not as popular as CSS selectors, XPath expressions offer more power because besides navigating the structure, it can also look at the content. Using XPath, you’re able to select things like: the link that contains the text “Next Page”. This makes XPath very fitting to the task of scraping, and we encourage you to learn XPath even if you already know how to construct CSS selectors, it will make scraping much easier.

We won’t cover much of XPath here, but you can read more about using XPath with Scrapy Selectors here. To learn more about XPath, we recommend this tutorial to learn XPath through examples, and this tutorial to learn “how to think in XPath”.

Extracting quotes and authors¶

Now that you know a bit about selection and extraction, let’s complete our spider by writing the code to extract the quotes from the web page.

Each quote in https://quotes.toscrape.com is represented by HTML elements that look like this:

<div class="quote"><span class="text">“The world as we have created it is a process of ourthinking. It cannot be changed without changing our thinking.”</span><span>by <small class="author">Albert Einstein</small><a href="/author/Albert-Einstein">(about)</a></span><div class="tags">Tags:<a class="tag" href="/tag/change/page/1/">change</a><a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a><a class="tag" href="/tag/thinking/page/1/">thinking</a><a class="tag" href="/tag/world/page/1/">world</a></div>
</div>

Let’s open up scrapy shell and play a bit to find out how to extract the data we want:

scrapy shell 'https://quotes.toscrape.com'

We get a list of selectors for the quote HTML elements with:

>>> response.css("div.quote")
[<Selector query="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' quote ')]" data='<div class="quote" itemscope itemtype...'>,
<Selector query="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' quote ')]" data='<div class="quote" itemscope itemtype...'>,
...]

Each of the selectors returned by the query above allows us to run further queries over their sub-elements. Let’s assign the first selector to a variable, so that we can run our CSS selectors directly on a particular quote:

>>> quote = response.css("div.quote")[0]

Now, let’s extract the textauthor and tags from that quote using the quote object we just created:

>>> text = quote.css("span.text::text").get()
>>> text
'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'
>>> author = quote.css("small.author::text").get()
>>> author
'Albert Einstein'

Given that the tags are a list of strings, we can use the .getall() method to get all of them:

>>> tags = quote.css("div.tags a.tag::text").getall()
>>> tags
['change', 'deep-thoughts', 'thinking', 'world']

Having figured out how to extract each bit, we can now iterate over all the quote elements and put them together into a Python dictionary:

>>> for quote in response.css("div.quote"):
...     text = quote.css("span.text::text").get()
...     author = quote.css("small.author::text").get()
...     tags = quote.css("div.tags a.tag::text").getall()
...     print(dict(text=text, author=author, tags=tags))
...
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'author': 'Albert Einstein', 'tags': ['change', 'deep-thoughts', 'thinking', 'world']}
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'author': 'J.K. Rowling', 'tags': ['abilities', 'choices']}
...

Extracting data in our spider¶

Let’s get back to our spider. Until now, it hasn’t extracted any data in particular, just saving the whole HTML page to a local file. Let’s integrate the extraction logic above into our spider.

A Scrapy spider typically generates many dictionaries containing the data extracted from the page. To do that, we use the yield Python keyword in the callback, as you can see below:

import scrapyclass QuotesSpider(scrapy.Spider):name = "quotes"start_urls = ["https://quotes.toscrape.com/page/1/","https://quotes.toscrape.com/page/2/",]def parse(self, response):for quote in response.css("div.quote"):yield {"text": quote.css("span.text::text").get(),"author": quote.css("small.author::text").get(),"tags": quote.css("div.tags a.tag::text").getall(),}

To run this spider, exit the scrapy shell by entering:

quit()

Then, run:

scrapy crawl quotes

Now, it should output the extracted data with the log:

2016-09-19 18:57:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/1/>
{'tags': ['life', 'love'], 'author': 'André Gide', 'text': '“It is better to be hated for what you are than to be loved for what you are not.”'}
2016-09-19 18:57:19 [scrapy.core.scraper] DEBUG: Scraped from <200 https://quotes.toscrape.com/page/1/>
{'tags': ['edison', 'failure', 'inspirational', 'paraphrased'], 'author': 'Thomas A. Edison', 'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}

Storing the scraped data¶

The simplest way to store the scraped data is by using Feed exports, with the following command:

scrapy crawl quotes -O quotes.json

That will generate a quotes.json file containing all scraped items, serialized in JSON.

The -O command-line switch overwrites any existing file; use -o instead to append new content to any existing file. However, appending to a JSON file makes the file contents invalid JSON. When appending to a file, consider using a different serialization format, such as JSON Lines:

scrapy crawl quotes -o quotes.jsonl

The JSON Lines format is useful because it’s stream-like, so you can easily append new records to it. It doesn’t have the same problem as JSON when you run twice. Also, as each record is a separate line, you can process big files without having to fit everything in memory, there are tools like JQ to help do that at the command-line.

In small projects (like the one in this tutorial), that should be enough. However, if you want to perform more complex things with the scraped items, you can write an Item Pipeline. A placeholder file for Item Pipelines has been set up for you when the project is created, in tutorial/pipelines.py. Though you don’t need to implement any item pipelines if you just want to store the scraped items.

Following links¶

Let’s say, instead of just scraping the stuff from the first two pages from https://quotes.toscrape.com, you want quotes from all the pages in the website.

Now that you know how to extract data from pages, let’s see how to follow links from them.

The first thing to do is extract the link to the page we want to follow. Examining our page, we can see there is a link to the next page with the following markup:

<ul class="pager"><li class="next"><a href="/page/2/">Next <span aria-hidden="true">&rarr;</span></a></li>
</ul>

We can try extracting it in the shell:

>>> response.css('li.next a').get()
'<a href="/page/2/">Next <span aria-hidden="true">→</span></a>'

This gets the anchor element, but we want the attribute href. For that, Scrapy supports a CSS extension that lets you select the attribute contents, like this:

>>> response.css("li.next a::attr(href)").get()
'/page/2/'

There is also an attrib property available (see Selecting element attributes for more):

>>> response.css("li.next a").attrib["href"]
'/page/2/'

Now let’s see our spider, modified to recursively follow the link to the next page, extracting data from it:

import scrapyclass QuotesSpider(scrapy.Spider):name = "quotes"start_urls = ["https://quotes.toscrape.com/page/1/",]def parse(self, response):for quote in response.css("div.quote"):yield {"text": quote.css("span.text::text").get(),"author": quote.css("small.author::text").get(),"tags": quote.css("div.tags a.tag::text").getall(),}next_page = response.css("li.next a::attr(href)").get()if next_page is not None:next_page = response.urljoin(next_page)yield scrapy.Request(next_page, callback=self.parse)

Now, after extracting the data, the parse() method looks for the link to the next page, builds a full absolute URL using the urljoin() method (since the links can be relative) and yields a new request to the next page, registering itself as callback to handle the data extraction for the next page and to keep the crawling going through all the pages.

What you see here is Scrapy’s mechanism of following links: when you yield a Request in a callback method, Scrapy will schedule that request to be sent and register a callback method to be executed when that request finishes.

Using this, you can build complex crawlers that follow links according to rules you define, and extract different kinds of data depending on the page it’s visiting.

In our example, it creates a sort of loop, following all the links to the next page until it doesn’t find one – handy for crawling blogs, forums and other sites with pagination.

A shortcut for creating Requests¶

As a shortcut for creating Request objects you can use response.follow:

import scrapyclass QuotesSpider(scrapy.Spider):name = "quotes"start_urls = ["https://quotes.toscrape.com/page/1/",]def parse(self, response):for quote in response.css("div.quote"):yield {"text": quote.css("span.text::text").get(),"author": quote.css("span small::text").get(),"tags": quote.css("div.tags a.tag::text").getall(),}next_page = response.css("li.next a::attr(href)").get()if next_page is not None:yield response.follow(next_page, callback=self.parse)

Unlike scrapy.Request, response.follow supports relative URLs directly - no need to call urljoin. Note that response.follow just returns a Request instance; you still have to yield this Request.

You can also pass a selector to response.follow instead of a string; this selector should extract necessary attributes:

for href in response.css("ul.pager a::attr(href)"):yield response.follow(href, callback=self.parse)

For <a> elements there is a shortcut: response.follow uses their href attribute automatically. So the code can be shortened further:

for a in response.css("ul.pager a"):yield response.follow(a, callback=self.parse)

To create multiple requests from an iterable, you can use response.follow_all instead:

anchors = response.css("ul.pager a")
yield from response.follow_all(anchors, callback=self.parse)

or, shortening it further:

yield from response.follow_all(css="ul.pager a", callback=self.parse)

More examples and patterns¶

Here is another spider that illustrates callbacks and following links, this time for scraping author information:

import scrapyclass AuthorSpider(scrapy.Spider):name = "author"start_urls = ["https://quotes.toscrape.com/"]def parse(self, response):author_page_links = response.css(".author + a")yield from response.follow_all(author_page_links, self.parse_author)pagination_links = response.css("li.next a")yield from response.follow_all(pagination_links, self.parse)def parse_author(self, response):def extract_with_css(query):return response.css(query).get(default="").strip()yield {"name": extract_with_css("h3.author-title::text"),"birthdate": extract_with_css(".author-born-date::text"),"bio": extract_with_css(".author-description::text"),}

This spider will start from the main page, it will follow all the links to the authors pages calling the parse_author callback for each of them, and also the pagination links with the parse callback as we saw before.

Here we’re passing callbacks to response.follow_all as positional arguments to make the code shorter; it also works for Request.

The parse_author callback defines a helper function to extract and cleanup the data from a CSS query and yields the Python dict with the author data.

Another interesting thing this spider demonstrates is that, even if there are many quotes from the same author, we don’t need to worry about visiting the same author page multiple times. By default, Scrapy filters out duplicated requests to URLs already visited, avoiding the problem of hitting servers too much because of a programming mistake. This can be configured in the DUPEFILTER_CLASS setting.

Hopefully by now you have a good understanding of how to use the mechanism of following links and callbacks with Scrapy.

As yet another example spider that leverages the mechanism of following links, check out the CrawlSpider class for a generic spider that implements a small rules engine that you can use to write your crawlers on top of it.

Also, a common pattern is to build an item with data from more than one page, using a trick to pass additional data to the callbacks.

Using spider arguments¶

You can provide command line arguments to your spiders by using the -a option when running them:

scrapy crawl quotes -O quotes-humor.json -a tag=humor

These arguments are passed to the Spider’s __init__ method and become spider attributes by default.

In this example, the value provided for the tag argument will be available via self.tag. You can use this to make your spider fetch only quotes with a specific tag, building the URL based on the argument:

import scrapyclass QuotesSpider(scrapy.Spider):name = "quotes"def start_requests(self):url = "https://quotes.toscrape.com/"tag = getattr(self, "tag", None)if tag is not None:url = url + "tag/" + tagyield scrapy.Request(url, self.parse)def parse(self, response):for quote in response.css("div.quote"):yield {"text": quote.css("span.text::text").get(),"author": quote.css("small.author::text").get(),}next_page = response.css("li.next a::attr(href)").get()if next_page is not None:yield response.follow(next_page, self.parse)

If you pass the tag=humor argument to this spider, you’ll notice that it will only visit URLs from the humor tag, such as https://quotes.toscrape.com/tag/humor.

You can learn more about handling spider arguments here.

Next steps¶

This tutorial covered only the basics of Scrapy, but there’s a lot of other features not mentioned here. Check the What else? section in the Scrapy at a glance chapter for a quick overview of the most important ones.

You can continue from the section Basic concepts to know more about the command-line tool, spiders, selectors and other things the tutorial hasn’t covered like modeling the scraped data. If you’d prefer to play with an example project, check the Examples section.

相关文章:

scrapy 教程

Scrapy Tutorial In this tutorial, we’ll assume that Scrapy is already installed on your system. If that’s not the case, see Installation guide. We are going to scrape quotes.toscrape.com, a website that lists quotes from famous authors. This tutorial …...

IDE和IDEA详解和具体差异

1. IDE(集成开发环境)概述 1.1 什么是 IDE? IDE(Integrated Development Environment,集成开发环境)是一种为开发者提供全面编程工具的软件应用程序。它将代码编辑、编译、调试、版本控制等功能集成在一个统一的界面中,旨在提高开发效率,减少开发者在不同工具之间切换…...

使用MPTCP+BBR进行数据传输,让网络又快又稳

1.前言 在前文《链路聚合技术——多路径传输Multipath TCP(MPTCP)快速实践》中我们使用mptcpize run命令实现了两个节点间通信使用MPTCP协议进行传输&#xff0c;并实现了传输速率的聚合。 实际应用中更推荐原生支持mptcp的应用&#xff0c;在MPTCP官网中可以看到如TCPDump、…...

【网络】网络基础知识(协议、mac、ip、套接字)

文章目录 1. 计算机网络的背景2. 认识网络协议2.1 协议分层2.2 OS与网络的关系 3. 网络传输基本流程3.1 局域网通信流程3.2 跨网络通信流程 4. Socket 编程预备4.1 理解源IP地址和目的IP地址4.2 端口号与Socket4.3传输层的典型代表4.4 网络字节序 5. socket 编程接口5.1 介绍5.…...

Unity【Colliders碰撞器】和【Rigibody刚体】的应用——小球反弹效果

目录 Collider 2D 定义&#xff1a; 类型&#xff1a; Rigidbody 2D 定义&#xff1a; 属性和行为&#xff1a; 运动控制&#xff1a; 碰撞检测&#xff1a; 结合使用 实用检测 延伸拓展 1、在Unity中优化Collider 2D和Rigidbody 2D的性能 2、Unity中Collider 2D…...

游戏引擎学习第75天

仓库:https://gitee.com/mrxiao_com/2d_game_2 Blackboard: 处理楼梯通行 为了实现楼梯的平滑过渡和角色的移动控制&#xff0c;需要对楼梯区域的碰撞与玩家的运动方式进行优化。具体的处理方式和遇到的问题如下&#xff1a; 楼梯区域的过渡&#xff1a; 在三维空间中&#x…...

ModelScope ms-swift:轻量级模型微调框架

ModelScope ms-swift&#xff1a;轻量级模型微调框架 介绍支持的模型支持的技术使用方法为什么选择ms-swift&#xff1f;结论 介绍 ModelScope ms-swift是ModelScope社区提供的一个官方框架&#xff0c;用于大型语言模型&#xff08;LLMs&#xff09;和多模态大型模型&#xf…...

管理加密SQLite数据库的软件工具研究

使用软件工具管理加密的 SQLite 数据库是一个常见需求&#xff0c;尤其是当需要保护敏感数据时。以下是实现此目标的步骤和相关工具推荐&#xff1a; 1. 选择支持加密的 SQLite 版本 SQLite 默认并不支持加密功能。你需要使用以下方法之一来启用加密&#xff1a; SQLite Encry…...

react 封装一个类函数使用方法

1.编写ProductCount函数 class ProductCount {public static getProductCount(count: number): string {if (count < 10) {return 当前数量: 0${count};}return 当前数量: ${count};} }export default ProductCount;2.在代码文件中导入 ProductCount 类。 import ProductC…...

Windows 11 上通过 WSL (Windows Subsystem for Linux) 安装 MySQL 8

在 Windows 11 上通过 WSL (Windows Subsystem for Linux) 安装 MySQL 8 的步骤如下&#xff1a; ✅ 1. 检查 WSL 的安装 首先确保已经安装并启用了 WSL 2。 &#x1f527; 检查 WSL 版本 打开 PowerShell&#xff0c;执行以下命令&#xff1a; wsl --list --verbose确保 W…...

解决 IntelliJ IDEA 中 Tomcat 日志乱码问题的详细指南

目录 前言1. 分析问题原因2. 解决方案 2.1 修改 IntelliJ IDEA 的 JVM 选项2.2 配置 Tomcat 实例的 VM 选项 2.2.1 设置 Tomcat 的 VM 选项2.2.2 添加环境变量 3. 进一步优化 3.1 修改 Tomcat 的 logging.properties3.2 修改操作系统默认编码 3.2.1 Windows 系统3.2.2 Linux …...

jenkins入门4 --window执行execute shell

1、启动关闭jenkins 在Windows环境下&#xff0c;如果你需要关闭Jenkins服务&#xff0c;可以通过以下几种方式&#xff1a; 1、使用Windows服务管理器&#xff1a; 打开“运行”对话框&#xff08;Win R&#xff09;&#xff0c;输入services.msc&#xff0c;然后回车。 在服…...

51c嵌入式~单片机~合集4

我自己的原文哦~ https://blog.51cto.com/whaosoft/12868932 一、时钟失效之后&#xff0c;STM32还能运行&#xff1f; 问题&#xff1a; 该问题由某客户提出&#xff0c;发生在 STM32F103VDT6 器件上。据其工程师讲述&#xff1a;在其产品的设计中&#xff0c;STM32 的 H…...

OKHttp调用第三方接口,响应转string报错okhttp3.internal.http.RealResponseBody@4a3d0218

原因分析 通过OkHttp请求网络&#xff0c;结果请求下来的数据一直无法解析并且报错&#xff0c;因解析时String res response.body().toString() 将toString改为string即可&#xff01;...

杰发科技——使用ATCLinkTool解除读保护

0. 原因 在jlink供电电压不稳定的情况下&#xff0c;概率性出现读保护问题&#xff0c;量产时候可以通过离线烧录工具避免。代码中开了读保护&#xff0c;但是没有通过can/uart/lin/gpio控制等方式进行关闭&#xff0c;导致无法关闭读保护。杰发所有芯片都可以用本方式解除读保…...

SQL 幂运算 — POW() and POWER()函数用法详解

POW() and POWER()函数用法详解 POW() 和 POWER() —计算幂运算&#xff08;即一个数的指定次方&#xff09;的函数。 这两个函数是等价的&#xff0c;功能完全相同&#xff0c;只是名字不同。 POW(base, exponent); POWER(base, exponent); base&#xff1a;底数。exponen…...

【Shell脚本】Docker构建Java项目,并自动停止原镜像容器,发布新版本

本文简述 经常使用docker部署SpringBoot 项目&#xff0c;因为自己的服务器小且项目简单&#xff0c;因此没有使用自动化部署。每次将jar包传到服务器后&#xff0c;需要手动构建&#xff0c;然后停止原有容器&#xff0c;并使用新的镜像启动&#xff0c;介于AI时代越来越懒的…...

【ArcGIS Pro二次开发实例教程】(2):BSM字段赋值

一、简介 一般的数据库要素或表格都有一个BSM字段&#xff0c;用来标识唯一值。 此工具要实现的功能是&#xff1a;按一定的规律&#xff08;前缀中间的填充数字OBJECT码&#xff09;来给BSM赋值。 主要技术要点包括&#xff1a; 1、ProWindow的创建&#xff0c;Label,Comb…...

VSCode函数调用关系图插件开发(d3-graphviz)

文章目录 1、如何在VSCode插件webview中用d3-graphviz绘图2、VSCode插件使用离线d3.min.js、d3-graphviz3、使用 `@hpcc-js/wasm` 包在 Node.js 环境直接转换dot为svg1、如何在VSCode插件webview中用d3-graphviz绘图 我来帮你创建一个 VS Code 插件示例,实现右键菜单触发 Web…...

OCR图片中文字识别(Tess4j)

文章目录 Tess4J下载 tessdataJava 使用Tess4j 的 demo Tess4J Tess4J 是 Tesseract OCR 引擎的 Java 封装库&#xff0c;它让 Java 项目更轻松地实现 OCR&#xff08;光学字符识别&#xff09;功能。 下载 tessdata 下载地址&#xff1a;https://github.com/tesseract-ocr/…...

leetcode 面试经典 150 题:同构字符串

链接同构字符串题序号205题型字符串解法哈希表难度简单熟练度✅✅✅✅ 题目 给定两个字符串 s 和 t &#xff0c;判断它们是否是同构的。 如果 s 中的字符可以按某种映射关系替换得到 t &#xff0c;那么这两个字符串是同构的。 每个出现的字符都应当映射到另一个字符&#…...

算法-泰波那契

力扣题目链接&#xff1a;1137. 第 N 个泰波那契数 - 力扣&#xff08;LeetCode&#xff09; 泰波那契序列 Tn 定义如下&#xff1a; T0 0, T1 1, T2 1, 且在 n > 0 的条件下 Tn3 Tn Tn1 Tn2 给你整数 n&#xff0c;请返回第 n 个泰波那契数 Tn 的值。 示例 1&…...

Mac修改文件权限

查看文件权限 ll -all 修改读写权限 sudo chmod -R arwx /usr/local/mysql-5.7.30-macos10.14-x86_64/data/a_test 修改用户分组 sudo chown -R _mysql:wheel /usr/local/mysql-5.7.30-macos10.14-x86_64/data/b_test...

如何安装和配置PHP开发环境?

要安装和配置PHP开发环境&#xff0c;可以按照以下步骤进行&#xff1a; 一、下载和安装PHP 1&#xff1a;下载PHP&#xff1a; 访问PHP官方网站&#xff08;PHP: Downloads&#xff09;&#xff0c;选择适合您操作系统的版本进行下载。 2&#xff1a;解压并安装PHP&#x…...

深入探讨 Android 中的 AlarmManager:定时任务调度及优化实践

引言 在 Android 开发中&#xff0c;AlarmManager 是一个非常重要的系统服务&#xff0c;用于设置定时任务或者周期性任务。无论是设置一个闹钟&#xff0c;还是定时进行数据同步&#xff0c;AlarmManager 都是不可或缺的工具之一。然而&#xff0c;随着 Android 系统的不断演…...

【Vim Masterclass 笔记07】S05L19:Vim 剪切、复制、粘贴操作同步练习

文章目录 S05L19 Vim 剪切、复制、粘贴操作同步练习&#xff08;Exercise 05 - Cut, Copy and Paste&#xff09;1 训练目标2 操作指令2.1 打开 dyp.txt 文件2.2 交换文件的头两行2.3 将文件首行 put 到文件其他为止2.4 练习在光标位置的上方粘贴文本行2.5 通过交换字符顺序更正…...

【前端下拉框】获取国家国旗

一、先看效果 二、代码实现&#xff08;含国旗&#xff09; <!DOCTYPE html> <html lang"zh"> <head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><…...

Elasticsearch 操作文档对数据的增删改查操作 索引库文档 操作数据 CRUD

介绍 在 Elasticsearch 中&#xff0c;文档的增、删、改、查操作是核心的基本功能。Elasticsearch 使用 RESTful API 提供这些操作&#xff0c;通常通过 HTTP 请求与 Elasticsearch 集群进行交互。 索引库 {"mappings": {"properties": {"title&qu…...

【动手学电机驱动】STM32-MBD(2)将 Simulink 模型部署到 STM32G431 开发板

STM32-MBD&#xff08;1&#xff09;安装 STM32 硬件支持包 STM32-MBD&#xff08;2&#xff09;Simulink 模型部署入门 STM32-MBD&#xff08;3&#xff09;Simulink 状态机模型部署 【动手学电机驱动】STM32-MBD&#xff08;2&#xff09;Simulink 模型部署入门 1. 软硬件条件…...

小试牛刀-SpringBoot集成SOL链

目录 一、什么是solanaj? 二、Pom依赖 三、主要类 3.1 RpcClient 3.2 PublicKey 3.3 Transaction 3.4 TransactionInstruction 四、示例代码 Welcome to Code Blocks blog 本篇文章主要介绍了 [小试牛刀-SpringBoot集成SOL链] ❤博主广交技术好友&#xff0c;喜欢文章的…...

数据结构大作业——家谱管理系统(超详细!完整代码!)

目录 设计思路&#xff1a; 一、项目背景 二、功能分析 查询功能流程图&#xff1a; 管理功能流程图&#xff1a; 三、设计 四、实现 代码实现&#xff1a; 头文件 结构体 函数声明及定义 创建家谱树头结点 绘制家谱树&#xff08;打印&#xff09; 建立右兄弟…...

【计算机网络】课程 实验二 交换机基本配置和VLAN 间路由实现

实验二 交换机基本配置和VLAN 间路由实现 一、实验目的 1&#xff0e;了解交换机的管理方式。 2&#xff0e;掌握通过Console接口对交换机进行配置的方法。 3&#xff0e;掌握交换机命令行各种模式的区别&#xff0c;能够使用各种帮助信息以及命令进行基本的配置。 4&…...

最新MySQL面试题(2025超详细版)

2025最新超详细MySQL面试题 文章目录 2025最新超详细MySQL面试题[toc]一、 SQL 和基本操作1. SQL的执行顺序2. 如何优化MySQL查询3. 常用的聚合函数4. 数据库事务5. 事务的四大特性(ACID)6. 视图7. MySQL中使用LIMIT子句进行分页8. MySQL中使用变量和用户定义的函数9. MySQL中的…...

Unity-Mirror网络框架-从入门到精通之AdditiveScenes 示例

文章目录 前言Additive Levels和Additive ScenesAdditive Levels场景介绍Portal传送门FadeInOut特效 Additive Scenes示例介绍ZoneHandlerSceneMassage 最后 前言 在现代游戏开发中&#xff0c;网络功能日益成为提升游戏体验的关键组成部分。Mirror是一个用于Unity的开源网络框…...

java 转义 反斜杠 Unexpected internal error near index 1

代码&#xff1a; String str"a\\c"; //出现异常&#xff0c;Unexpected internal error near index 1 //System.out.println(str.replaceAll("\\", "c"));//以下三种都正确 System.out.println(str.replace(\\, c)); System.out.println(str.r…...

html内容过长,实现向上循环滑动效果

以下是几种实现 HTML 内容过长时向上循环滑动的常见方法&#xff0c;你可以根据具体需求和项目场景来选择合适的实现方式&#xff1a; 一、使用 CSS3 animation 实现简单的向上循环滑动&#xff08;适用于简单的文本等内容滑动场景&#xff09; 原理 通过 CSS3 的 keyframes…...

RAG(Retrieval-Augmented Generation,检索增强生成)流程

目录 一、知识文档的准备二、OCR转换三、分词处理四、创建向量数据库五、初始化语言聊天模型1.prompt2.检索链3.对话 完整代码 知识文档的准备&#xff1a;首先需要准备知识文档&#xff0c;这些文档可以是多种格式&#xff0c;如Word、TXT、PDF等。使用文档加载器或多模态模型…...

数据库系统概论期末复习

期末考试题型&#xff1a; 选择题 20题 20分 判断题 10题 10分 简答题 4题 20分 SQL语句&#xff1a; &#xff08;select delete update&#xff09;30分 设计题&#xff1a;ER图 和关系模式 ER转关系模式&#xff0c;注意主码&#xff0c;外码的标注 15分 应用题&#xff1a;…...

B树与B+树:数据库索引的秘密武器

想象一下&#xff0c;你正在构建一个超级大的图书馆&#xff0c;里面摆满了各种各样的书籍。B树和B树就像是两种不同的图书分类和摆放方式&#xff0c;它们都能帮助你快速找到想要的书籍&#xff0c;但各有特点。 B树就像是一个传统的图书馆摆放方式&#xff1a; 1. 书籍摆放&…...

数据结构-栈与队列笔记

普通的双端队列 验证图书取出顺序 class Solution {/*** 验证书籍的借阅顺序是否合法。* * param putIn 表示放入书架的书籍序列。* param takeOut 表示从书架取出的书籍序列。* return 如果书籍的借阅顺序合法&#xff0c;返回 true&#xff1b;否则返回 false。*/public boo…...

Netty中用了哪些设计模式?

大家好&#xff0c;我是锋哥。今天分享关于【Netty中用了哪些设计模式&#xff1f;】面试题。希望对大家有帮助&#xff1b; Netty中用了哪些设计模式&#xff1f; 1000道 互联网大厂Java工程师 精选面试题-Java资源分享网 Netty 是一个高性能的网络通信框架&#xff0c;广泛…...

设计模式与游戏完美开发(3)

更多内容可以浏览本人博客&#xff1a;https://azureblog.cn/ &#x1f60a; 该文章主体内容来自《设计模式与游戏完美开发》—蔡升达 第二篇 基础系统 第五章 获取游戏服务的唯一对象——单例模式&#xff08;Singleton&#xff09; 游戏实现中的唯一对象 在游戏开发过程中…...

人工智能的发展领域之GPU加速计算的应用概述、架构介绍与教学过程

文章目录 一、架构介绍GPU算力平台概述优势与特点 二、注册与登录账号注册流程GPU服务器类型配置选择指南内存和存储容量网络带宽CPU配置 三、创建实例实例创建步骤镜像选择与设置 四、连接实例SSH连接方法远程桌面配置 一、架构介绍 GPU算力平台概述 一个专注于GPU加速计算的…...

【51单片机零基础-chapter5:模块化编程】

模块化编程 将以往main中泛型的代码,放在与main平级的c文件中,在h中引用. 简化main函数 将原来main中的delay抽出 然后将delay放入单独c文件,并单独开一个delay头文件,里面放置函数的声明,相当于收纳delay的c文件里面写的函数的接口. 注意,单个c文件所有用到的变量需要在该文…...

彻底学会Gradle插件版本和Gradle版本及对应关系

看完这篇&#xff0c;保你彻底学会Gradle插件版本和Gradle版本及对应关系&#xff0c;超详细超全的对应关系表 需要知道Gradle插件版本和Gradle版本的对应关系&#xff0c;其实就是需要知道Gradle插件版本对应所需的gradle最低版本&#xff0c;详细对应关系如下表格&#xff0…...

容器技术思想 Docker K8S

容器技术介绍 以Docker为代表的容器技术解决了程序部署运行方面的问题。在容器技术出现前&#xff0c;程序直接部署在物理服务器上&#xff0c;依赖管理复杂&#xff0c;包括各类运行依赖&#xff0c;且易变&#xff0c;多程序混合部署时还可能产生依赖冲突&#xff0c;给程序…...

在C程序中实现类似Redis的SCAN机制的LevelDB大规模key分批扫描

在C程序中实现类似Redis的SCAN机制的LevelDB大规模key分批扫描&#xff0c;需要充分利用LevelDB的迭代器&#xff08;iterator&#xff09;功能&#xff0c;以便能够高效地扫描和处理大量的键值对。下面是一个详细的实现指南。 环境准备 首先&#xff0c;确保已经安装了Level…...

多模态论文笔记——CogVLM和CogVLM2

大家好&#xff0c;这里是好评笔记&#xff0c;公主号&#xff1a;Goodnote&#xff0c;专栏文章私信限时Free。本文详细介绍多模态模型的LoRA版本——CogVLM和CogVLM2。在SD 3中使用其作为captioner基准模型的原因和优势。 文章目录 CogVLM论文背景VLMs 的任务与挑战现有方法及…...

BLDC无感控制的驱动逻辑

如何知道转子已经到达预定位置&#xff0c;因为我们只有知道了转子到达了预定位置之后才能进行换相&#xff0c;这样电机才能顺滑的运转。转子位置检测常用的有三种方式。 方式一&#xff1a;通过过零检测&#xff0c;三相相电压与电机中性点电压进行比较。过零检测的优点在于…...

分布式多机多卡训练全景指南:MPI、DeepSpeed 与 Colossal-AI 深度解析

分布式多机多卡训练技术是深度学习领域提高训练效率和加快模型收敛的重要手段。以下是几个流行的框架和工具&#xff1a; 1. MPI&#xff08;Message Passing Interface&#xff09; 概述 MPI 是一种标准化的消息传递协议&#xff0c;用于多机多卡之间的通信与协作&#xff0c…...