Live OnPage API Content Parsing
This endpoint allows parsing the content on any page you specify and will return the structured content of the target page, including link URLs, anchors, hea…
Documentation Index
Fetch the complete documentation index at: https://aisa.one/docs/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
URL of the content to parse required field URL of the page to parse example: https://www.fujielectric.com/
custom user agent optional field custom user agent for crawling a website example: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 default value: Mozilla/5.0 (compatible; RSiteAuditor)
preset for browser screen parameters optional field if you use this field, you don’t need to indicate browser_screen_width, browser_screen_height, browser_screen_scale_factor possible values: desktop, mobile, tablet desktop preset will apply the following values: browser_screen_width: 1920 browser_screen_height: 1080 browser_screen_scale_factor: 1 mobile preset will apply the following values: browser_screen_width: 390 browser_screen_height: 844 browser_screen_scale_factor: 3 tablet preset will apply the following values: browser_screen_width: 1024 browser_screen_height: 1366 browser_screen_scale_factor: 2 Note: to use this parameter, set enable_javascript or enable_browser_rendering to true
browser screen width optional field you can set a custom browser screen width to perform audit for a particular device; if you use this field, you don’t need to indicate browser_preset as it will be ignored; Note: to use this parameter, set enable_javascript or enable_browser_rendering to true minimum value, in pixels: 240 maximum value, in pixels: 9999
browser screen height optional field you can set a custom browser screen height to perform audit for a particular device; if you use this field, you don’t need to indicate browser_preset as it will be ignored; Note: to use this parameter, set enable_javascript or enable_browser_rendering to true minimum value, in pixels: 240 maximum value, in pixels: 9999
browser screen scale factor optional field you can set a custom browser screen resolution ratio to perform audit for a particular device; if you use this field, you don’t need to indicate browser_preset as it will be ignored; Note: to use this parameter, set enable_javascript or enable_browser_rendering to true minimum value: 0.5 maximum value: 3
store HTML of a crawled page optional field set to true if you want to get the HTML of the page using the OnPage Raw HTML endpoint default value: false
disable the cookie popup optional field set to true if you want to disable the popup requesting cookie consent from the user; default value: false
language header for accessing the website optional field all locale formats are supported (xx, xx-XX, xxx-XX, etc.) Note: if you do not specify this parameter, some websites may deny access; in this case, pages will be returned with the "type":"broken in the response array
load javascript on a page optional field set to true if you want to load the scripts available on a page default value: false Note: if you use this parameter, additional charges will apply; learn more about the cost of tasks with this parameter in our help article; the cost can be calculated on the Pricing Page
emulate browser rendering to measure Core Web Vitals optional field by using this parameter you will be able to emulate a browser when loading a web page; enable_browser_rendering loads styles, images, fonts, animations, videos, and other resources on a page; default value: false set to true to obtain Core Web Vitals (FID, CLS, LCP) metrics in the response; if you use this field, enable_javascript, and load_resources parameters must be set to true Note: if you use this parameter, additional charges will apply; learn more about the cost of tasks with this parameter in our help article; the cost can be calculated on the Pricing Page
enable XMLHttpRequest on a page optional field set to true if you want our crawler to request data from a web server using the XMLHttpRequest object default value: false if you use this field, enable_javascript must be set to true;
switch proxy pool optional field if true, additional proxy pools will be used to obtain the requested data; the parameter can be used if a multitude of tasks is set simultaneously, resulting in occasional rate-limit and/or site_unreachable errors
proxy pool optional field you can choose a location of the proxy pool that will be used to obtain the requested data; the parameter can be used if page content is inaccessible in one of the locations, resulting in occasional site_unreachable errors possible values: us, de
return page content as markdown optional field if set to true, the markdown-formatted content of the page will be returned in the page_as_markdown field of the response; default value: false
Response
Successful response
the current version of the API
general status code you can find the full list of the response codes here Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions
general informational message you can find the full list of general informational messages here
execution time, seconds
total tasks cost, USD
the number of tasks in the tasks array
the number of tasks in the tasks array returned with an error
array of tasks
task identifier unique task identifier in our system in the UUID format
status code of the task generated by DataForSEO; can be within the following range: 10000-60000 you can find the full list of the response codes here
informational message of the task you can find the full list of general informational messages here
execution time, seconds
cost of the task, USD
number of elements in the result array
URL path
contains the same parameters that you specified in the POST request
array of results
status of the crawling session possible values: in_progress, finished
details of the crawling session
number of items in the results array
items array
type of the returned item = ‘сontent_parsing_element’
date and time when the content was fetched in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00” example: "2022-11-01 10:02:52 +00:00"
status code of the page
parsed content of the page
parsed content of the header
primary content on the page you can find more information about content priority calculation in this help center article
content text
page URL displayed in case the text is a link anchor
contains other URLs and anchors found in the content element
other URL found in the content element
text of the URL’s anchor
secondary content on the page you can find more information about content priority calculation in this help center article
content text
page URL displayed in case the text is a link anchor
contains other URLs and anchors found in the content element
other URL found in the content element
text of the URL’s anchor
content of the table on the page
content of the header of the table
content of the row cells of the header
text in the row cell
contains other URLs and anchors found in the cell
URL found in the cell
text of the URL’s anchor
indicates if the text belongs to the header
content of the body of the table
content of the row cells of the header
text in the row cell
contains other URLs and anchors found in the cell
URL found in the cell
text of the URL’s anchor
indicates if the text belongs to the header
content of the footer of the table
content of the row cells of the header
text in the row cell
contains other URLs and anchors found in the cell
URL found in the cell
text of the URL’s anchor
indicates if the text belongs to the header
parsed content of the footer
main topic on the page you can find more information about topic priority calculation in this help center article
meta title
main title of the block
content author name
content language
HTML level
primary content on the page you can find more information about content priority calculation in this help center article
content text
page URL displayed in case the text is a link anchor
contains other URLs and anchors found in the content element
other URL found in the content element
text of the URL’s anchor
secondary content on the page you can find more information about content priority calculation in this help center article
secondary topic on the page you can find more information about topic priority calculation in this help center article
meta title
main title of the block
content author name
content language
HTML level
primary content on the page you can find more information about content priority calculation in this help center article
content text
page URL displayed in case the text is a link anchor
contains other URLs and anchors found in the content element
other URL found in the content element
text of the URL’s anchor
secondary content on the page you can find more information about content priority calculation in this help center article
contains objects with rating information for the products displayed on the page
rating name Note: this field is not used in this particular object, and its value is always set to null
the value of the rating
maximum value for the rating
the amount of feedback
relative rating can take values from 0 to 1
array of products displayed on the page contains objects with information on products displayed on the page
name of the product
price of the product
price currency
displays the date and time until which the price is valid in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00” example: "2022-11-01 10:02:52 +00:00"
array of comments displayed on the page contains objects with information on comments related to displayed products
product’s rating contains information about the rating a customer has given to the product
rating name Note: this field is not used in this particular object, and its value is always null
the value of the rating
maximum value for the rating
the amount of feedback Note: this field is not used in this particular object, and its value is always null
relative rating can take values from 0 to 1
title of the customer’s comment
date when the comment was published
author of the comment
primary content on the page you can find more information about content priority calculation in this help center article
text of the comment
displayed in case the text is a link anchor
contains other URLs and anchors found in the content element
contact information contains contact information displayed on the page
array of telephone numbers
array of emails
page content in the markdown format page content in the text-to-HTML markdown format specify markdown_view as true in the request to return the value