OnPage API Duplicate Content
This endpoint returns a list of pages that have content similar to the page specified in the request.
Documentation Index
Fetch the complete documentation index at: https://aisa.one/docs/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
ID of the task required field you can get this ID in the response of the Task POST endpoint example: “07131248-1535-0216-1000-17384017ad04”
page URL required field specify the initial page you want to receive duplicate content for
content similarity score by default, the content is considered duplicate if the value is greater than or equals 6 you can specify any similarity score in the 0-to-10 range
the maximum number of returned pages optional field default value: 100 maximum value: 1000
offset in the results array of returned pages optional field default value: 0 if you specify the 10 value, the first ten pages in the results array will be omitted and the data will be provided for the successive pages
user-defined task identifier optional field the character limit is 255 you can use this parameter to identify the task and match it with the result you will find the specified tag value in the data object of the response
Response
Successful response
the current version of the API
general status code you can find the full list of the response codes here Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions
general informational message you can find the full list of general informational messages here
execution time, seconds
total tasks cost, USD
the number of tasks in the tasks array
the number of tasks in the tasks array returned with an error
array of tasks
task identifier unique task identifier in our system in the UUID format
status code of the task generated by DataForSEO; can be within the following range: 10000-60000 you can find the full list of the response codes here
informational message of the task you can find the full list of general informational messages here
execution time, seconds
cost of the task, USD
number of elements in the result array
URL path
contains the same parameters that you specified in the POST request
array of results
status of the crawling session possible values: in_progress, finished
details of the crawling session
maximum number of pages to crawl indicates the max_crawl_pages limit you specified when setting a task
number of pages that are currently in the crawling queue
number of crawled pages
number of items in the results array
items array
URL of the specified page
total count of duplicate pages
pages with duplicate content
content similarity score by default, the content is considered duplicate if the value is greater than or equals 6 can take values from 0 to 10
information about the page with duplicate content
type of the returned resource = ‘html’
status code of the page
location header indicates the URL to redirect a page to
page URL
page properties the value depends on the resource_type
page title
code page example: 65001
indicates whether a page’s ‘meta robots’ allows crawlers to follow the links on the page if false, the page’s ‘meta robots’ tag contains “nofollow” parameter instructing crawlers not to follow the links on the page
meta tag generator
HTML header tags
content of the meta description tag
favicon of the page
content of the keywords meta tag
canonical page
number of internal links on the page
number of external links on the page
number of internal links pointing at the page
number of images on the page
total size of images on the page measured in bytes
number of scripts on the page
total size of scripts on the page measured in bytes
number of stylesheets on the page
total size of stylesheets on the page measured in bytes
length of the title tag in characters
length of the description tag in characters
number of scripts on the page that block page rendering
number of CSS styles on the page that block page rendering
Core Web Vitals metric measuring the layout stability of a page measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page. Learn more.
overall information about content of the page
total size of the text on the page measured in bytes
plaintext rate value plain_text_size to size ratio
number of words on the page
Automated Readability Index
Coleman–Liau Index
Dale–Chall Readability Index
Flesch–Kincaid Readability Index
SMOG Readability Index
consistency of the meta description tag with the page content measured from 0 to 1
consistency of the meta title tag with the page content measured from 0 to 1
consistency of meta keywordstag with the page content measured from 0 to 1
deprecated tags on the page
duplicate meta tags on the page
spellcheck hunspell spellcheck errors
spellcheck language code
array of misspelled words
misspelled word
resource errors and warnings
resource errors
line where the error was found
text message of the error the full list of possible HTML errors can be found here
resource warnings
line the warning relates to note that if "line": 0, the warning relates to the whole page
text message of the warning possible messages: "Has node with more than 60 childs." – HTML page has at least 1 tag nesting over 60 tags of the same level "Has more that 1500 nodes." – DOM tree contains over 1,500 elements "HTML depth more than 32 tags." – DOM depth exceeds 32 nodes
array of social media tags found on the page contains social media tags and their content supported tags include but are not limited to Open Graph and Twitter card
object of page load metrics
Time To Interactive (TTI) metric the time it takes until the user can interact with a page (in milliseconds)
time to load resources the time it takes until the page and all of its subresources are downloaded (in milliseconds)
Core Web Vitals metric measuring how fast the largest above-the-fold content element is displayed The amount of time (in milliseconds) to render the largest content element visible in the viewport, from when the user requests the URL. Learn more.
Core Web Vitals metric indicating the responsiveness of a page The time (in milliseconds) from when a user first interacts with your page to the time when the browser responds to that interaction. Learn more.
time to connect to a server the time it takes until the connection with a server is established (in milliseconds)
time to establish a secure connection the time it takes until the secure connection with a server is established (in milliseconds)
time to send a request to a server the time it takes until the request to a server is sent (in milliseconds)
time to first byte (TTFB) in milliseconds
time it takes for a browser to receive a response (in milliseconds)
total time it takes until a browser receives a complete response from a server (in milliseconds)
time to start downloading the HTML resource the amount of time the browser needs to start downloading a page
time to complete downloading the HTML resource the amount of time the browser needs to complete downloading a page
shows how page is optimized on a 100-point scale this field shows how page is optimized considering critical on-page issues and warnings detected; 100 is the highest possible score that means the page does not have any critical on-page issues and important warnings; learn more about how the metric is calculated in this help center article
total DOM size of a page
the result of executing a specified JS script note that you should specify a custom_js field when setting a task to receive this data and the field type and its value will totally depend on the script you specified;you can also filter the results by this value specifying filters in the following way: ["custom_js_response.url", "like", "pixel"]
error when executing a custom js if the error occurred when executing the script you specified in the custom_js field, the error message would be displayed here
indicates whether a page contains broken resources
indicates whether a page contains broken links
indicates whether a page has duplicate title tags
indicates whether a page has a duplicate description
indicates whether a page has duplicate content
number of clicks it takes to get to the page indicates the number of clicks from the homepage needed before landing at the target page
resource size indicates the size of a given page measured in bytes
page size after encoding indicates the size of the encoded page measured in bytes
compressed page size indicates the compressed size of a given page
date and time when a resource was fetched in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00” example: 2019-11-15 12:57:46 +00:00
instructions for caching
indicates whether the page is cacheable
time to live the amount of time the browser caches a resource
website checks on-page check-ups related to the page
page with no content encoding indicates whether a page has no compression algorithm of the content
page with high loading time indicates whether a page loading time exceeds 3 seconds
page with redirects indicates whether a page has 3XX redirects to other pages
page with 4xx status codes indicates whether a page has 4xx response code
page with 5xx status codes indicates whether a page has 5xx response code
broken page indicates whether a page returns a response code less than 200 or greater than 400
page with www indicates whether a page is on a www subdomain
page with the https protocol
page with the http protocol
page with high waiting time indicates whether a page waiting time (aka Time to First Byte) exceeds 1.5 seconds
page with no doctype indicates whether a page is without the declaration
page is canonical
page with no meta tag encoding indicates whether a page is without Content-Type; informative only if the encoding is not explicit in the Content-Type header; for example: Content-Type: "text/html; charset=utf8"; Note: available for pages with canonical check set to true
page with empty or absent h1 tags Note: available for pages with canonical check set to true
HTTPS page has links to HTTP pages if true, this HTTPS page has links to HTTP pages Note: available for pages with canonical check set to true
page with HTML doctype declaration if true, the page has HTML DOCTYPE declaration
page with size larger than 3 MB if true, the page size is exceeding 3 MB; Note: available for pages with canonical check set to true
consistency between charset encoding and page charset if true, the page’s charset encoding doesn’t match the actual charset of the page; Note: available for pages with canonical check set to true
pages with meta refresh redirect if true, the page has tag that instructs a browser to load another page after a specified time span; Note: available for pages with canonical check set to true
page with render-blocking resources if true, the page has render-blocking scripts or stylesheets; Note: available for pages with canonical check set to true
page with multiple redirects if true, there were at least two redirects before our crawler reached this page
page with low content rate indicates whether a page has the plaintext size to page size ratio of less than 0.1; Note: available for pages with canonical check set to true
page with high content rate indicates whether a page has the plaintext size to page size ratio of more than 0.9; Note: available for pages with canonical check set to true
indicates whether the page has less than 1024 characters Note: available for pages with canonical check set to true
indicates whether the page has more than 256,000 characters Note: available for pages with canonical check set to true
indicates whether a page is too small the value will be true if a page size is smaller than 1024 bytes; Note: available for pages with canonical check set to true
indicates whether a page is too heavy the value will be true if a page size exceeds 1 megabyte; Note: available for pages with canonical check set to true
page with a low readability rate indicates whether a page is scored less than 15 points on the Flesch–Kincaid readability test; Note: available for pages with canonical check set to true
page with irrelevant description indicates whether a page description tag is irrelevant to the content of a page; the relevance threshold is 0.2; Note: available for pages with canonical check set to true
page with irrelevant title indicates whether a page title tag is irrelevant to the content of the page the relevance threshold is 0.3 Note: available for pages with canonical check set to true
page with irrelevant meta keywords indicates whether a page keywords tags are irrelevant to the content of a page the relevance threshold is 0.6 Note: available for pages with canonical check set to true
page with a long title indicates whether the content of the title tag exceeds 65 characters; Note: available for pages with canonical check set to true
page with short titles indicates whether the content of title tag is shorter than 30 characters; Note: available for pages with canonical check set to true
page with deprecated tags indicates whether a page has deprecated HTML tags; Note: available for pages with canonical check set to true
page with duplicate meta tags indicates whether a page has more than one meta tag of the same type; Note: available for pages with canonical check set to true
page with more than one title tag indicates whether a page has more than one title tag; Note: available for pages with canonical check set to true
images without alt tags Note: available for pages with canonical check set to true
images without title tags Note: available for pages with canonical check set to true
pages with no description indicates whether a page has an empty or absent description meta tag; Note: available for pages with canonical check set to true
page with no title indicates whether a page has an empty or absent title tag; Note: available for pages with canonical check set to true
page with no favicon Note: available for pages with canonical check set to true
page with seo-frienldy URL the ‘SEO-friendliness’ of a page URL is checked by four parameters: – the length of the relative path is less than 120 characters – no special characters – no dynamic parameters – relevance of the URL to the page if at least one of them is failed then such URL is considered as not ‘SEO-friendly’; Note: available for pages with canonical check set to true
page with flash indicates whether a page has flash elements
page with frames indicates whether a page contains frame, iframe, frameset tags
page with lorem ipsum indicates whether a page has lorem ipsum content; Note: available for pages with canonical check set to true
page with misspelled content
URL characters check-up indicates whether a page URL containing only uppercase and lowercase Latin characters, digits and dashes
URL dynamic check-up the value will be true if a page has no dynamic parameters in the url
URL keyword check-up indicates whether a page URL is consistent with the title meta tag
URL length check-up the value will be true if a page URL no longer than 120 characters
page with no internal links pointing to it true if the page has no reference from other pages of the domain
mix of both followed and nofollowed incoming internal links true if the page receives at least one link with the rel="nofollow" attribute and at least one dofollow link
page is pointing to a page that redirect elsewhere true if the page is pointing to a page that responds with a 3XX redirect
recursive canonical error true if the page contains rel="canonical" tag to another page, which in turn, refers back to the initial page
pages with canonical pointing to a page that has a canonical pointing elsewhere true if the page has a canonical link element pointing to a page that has a canonical pointing to a different page e.g. page a is canonicalized to page b, which is canonicalized to page c
canonical page pointing to a page that redirect elsewhere true if the page has a canonical link element pointing to a page that responds with a 3XX redirect
canonical link pointing to a broken page true if the page has a a canonical link pointing to a page that responds with a 4xx or 5xx response codes
type of encoding
types of media used to display a page
server version
indicates whether a page is a single resource
contains data on changes related to the resource if there is no data, the value will be null
date and time when the header was last modified in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00” example: 2019-11-15 12:57:46 +00:00 if there is no data, the value will be null
date and time when the sitemap was last modified in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00” example: 2019-11-15 12:57:46 +00:00 if there is no data, the value will be null
date and time when the meta tag was last modified in the UTC format: “yyyy-mm-dd hh-mm-ss +00:00” example: 2019-11-15 12:57:46 +00:00 if there is no data, the value will be null