Commit cb5bd628 authored by Protocole's avatar Protocole
Browse files

Merge branch '314-links_checker-migrate-to-a-new-tool' into 'latest'

Resolve "Links_checker, migrate to a new tool"

Closes #314

See merge request r2devops/hub!353
parents b18f04b1 2124cde0
Loading
Loading
Loading
Loading
+8 −0
Original line number Diff line number Diff line
# Changelog
All notable changes to this job will be documented in this file.

## [2.0.0] - 2022-05-15
* 🚨 **BREAKING CHANGE**: Deleting `liche` related variable for `lynchee` variables. Please refer to `1.0` to keep deprecated behavior. 
* Change tool to lychee instead of deprecated liche 
* New behaviour of variable `REPORT_OUTPUT` : doesn't download needed tools and doesn't generate a report if empty 
* New variable `LYCHEE_EXCLUDE_LINKS` to exclude some links of the analyze 
* Correct links are displayed in the console instead of report 
* Recursive mode no more available

## [1.0.0] - 2022-04-14
* Change the default stage into `tests`

+27 −15
Original line number Diff line number Diff line
@@ -2,7 +2,7 @@

Using this job you will be able to detect most (see [here](#types-of-link-verified)) broken links in your **Markdown** or **HTML** files.

It uses the tool [`Liche`](https://github.com/raviqqe/liche){:target="_blank"} in [Go](https://golang.org/){:target="_blank"}
It uses the tool [`Lychee`](https://github.com/lycheeverse/lychee){:target="_blank"} in [Rust](https://rust-lang.org){:target="_blank"}
to test and find the links in your documents.
In its default state, this job will analyze your whole project for eligible files to verify.

@@ -22,7 +22,7 @@ In its default state, this job will analyze your whole project for eligible file

* Job name: `links_checker`
* Docker image:
[`peterevans/liche:1.1.1`](https://hub.docker.com/r/peterevans/liche){:target="_blank"}
[`lycheeverse/lychee:0.9`](https://hub.docker.com/r/lycheeverse/lychee){:target="_blank"}
* Default stage: `tests`
* When: `always`

@@ -30,20 +30,26 @@ In its default state, this job will analyze your whole project for eligible file

| Name | Description | Default |
| ---- | ----------- | ------- |
| `LICHE_DIRECTORY` <img width=450/> | Path to the directory to be scanned | ` ` |
| `LICHE_FILES` | A list of files (separated with spaces) to scan. It can be used with `LICHE_DIRECTORY` | ` ` |
| `LICHE_EXCLUDE` | A [regular expression](https://en.wikipedia.org/wiki/Regular_expression){:target="_blank"} to exclude a pattern of link | ` ` |
| `LICHE_PRINT_OK` | In addition to broken links, it will add not-broken links in the report (see [artifacts](#artifacts)) | `false` |
| `LICHE_RECURSIVE` | When `LICHE_DIRECTORY` is filled it will search for files recursively  | `true` |
| `LYCHEE_DIRECTORY` <img width=450/> | Path to the directory to be scanned | ` ` |
| `LYCHEE_FILES` | A list of files (separated with spaces) to scan. It can be used with `LYCHEE_DIRECTORY` | ` ` |
| `LYCHEE_EXCLUDE_LINKS` | A [regular expression](https://en.wikipedia.org/wiki/Regular_expression){:target="_blank"} to exclude a pattern of link | ` ` |
| `LYCHEE_EXCLUDE` | A [regular expression](https://en.wikipedia.org/wiki/Regular_expression){:target="_blank"} to exclude files or directory matching a pattern | ` ` |
| `LYCHEE_PRINT_OK` | Display in the console the output | `false` |
| `FAIL_ON_BROKEN` | Make your pipeline fails when a broken link is found | `false` |
| `ROOT_DIRECTORY` | Used for absolute paths, it defines the root of HTML projects | ` ` |
| `LICHE_OPTIONS` | Additional options (see [options](https://github.com/raviqqe/liche){:target="_blank"}) | ` ` |
| `REPORT_OUTPUT` | Report file's name | `junit-report.xml` |
| `LYCHEE_OPTIONS` | Additional options (see [options](https://github.com/lycheeverse/lychee#commandline-parameters){:target="_blank"}) | ` ` |
| `ROOT_DIRECTORY` | Used for absolute paths, it defines the root of HTML projects | ` ` |
| `REPORT_OUTPUT` | Report file's name(see [artifacts](#artifacts)). Is not generated if empty (can increase jobs speed) | `junit-report.xml` |

!!! warning
    As this job is still in development, some behavior could be unexpected.
    For example, avoid using `LYCHEE_EXCLUDE` and `--include <link>` options together as include has preference over all excludes and `LYCHEE_EXCLUDE` uses a hand-written  `find` command. 


### Types of link verified

This tool will check for links in a specific context, and so in your project some link formats may not be checked. However,
here is (a non-exhaustive) list of what `Liche` can or can't identify:
here is (a non-exhaustive) list of what `Lychee` can identify:

**In HTML files (`.html`, `.htm`):**
```HTML
@@ -54,9 +60,6 @@ Can identify:
* <a href="mailto:contact@google.com"></a>
* <img src="../images/logo.png"/>
* <img src="logo.png"/>

Can't identify:

* <div onClick="redirect('https://www.google.com')"></div>
* <script type="text/javascript">
      window.location.href = "https://www.google.com"
@@ -81,7 +84,16 @@ Can identify:
If you are using absolute paths in your HTML documents, be sure to fill the variable `ROOT_DIRECTORY`. If you don't, by default, the variable will be filled with the same path as `LICHE_DIRECTORY`.

If you use URL rewriting in your static website, using this job, most of the internal links will be considered as broken. To avoid that, you can define that you
only want to check external links, by using `LICHE_EXCLUDE: "^[^http]"` (see [regex](https://en.wikipedia.org/wiki/Regular_expression){:target="_blank"})
only want to check external links, by using `LYCHEE_EXCLUDE_LINKS: "^[^https?]"` (see [regex](https://en.wikipedia.org/wiki/Regular_expression){:target="_blank"})

### Filtering status code and authentication 

There are several method to authenticate into website with Lychee. As multiple methods are available, you need to choose your own and override the `LYCHEE_OPTIONS` variable to define it. Here are some case of authentication :
For basic authentication like username:password`, use option `--basic-auth`.
If you need to access an URL that require some Header token to authenticate, like Bearer, you could use this syntax : ` --headers 'Authorization: 'Bearer <token>'`
Last, you can avoid rate limiting on GitHub links by using this syntax : `--github-token <github-token>`.

If you are still issuing some 503 status code which requires authentication, you can ignore them by setting `LYCHEE_OPTIONS` to `-a 503`. 

### Artifacts

@@ -91,4 +103,4 @@ directly in pipeline `Test` tab and in merge request widget.


### Author
This resource is an **[official job](https://docs.r2devops.io/faq-labels/)** added in [**R2Devops repository**](https://gitlab.com/r2devops/hub) by [@Protocole](https://gitlab.com/Protocole)
 No newline at end of file
This resource is an **[official job](https://docs.r2devops.io/faq-labels/)** added in [**R2Devops repository**](https://gitlab.com/r2devops/hub) by [@Protocole](https://gitlab.com/Protocole). Was updated by [@GridexX](https://gitlab.com/GridexX) on May 2022 with a better tool.
 No newline at end of file
+103 −74
Original line number Diff line number Diff line
@@ -5,76 +5,105 @@ stages:

links_checker:
  image: 
        name: peterevans/liche:1.1.1
    name: lycheeverse/lychee:0.9
    entrypoint: [""]
  stage: tests
  variables:
        # Variables relative to LICHE tool
        ## Defines in which directory LICHE is looking for files
        LICHE_DIRECTORY: ""
        ## Defines which files it should check
        LICHE_FILES: ""
    # Variables relative to LYCHEE tool
    ## Defines in which directory LYCHEE is looking for files
    LYCHEE_DIRECTORY: "."
    ## Defines which files it should check, works with a pattern
    LYCHEE_FILES: ""
    ## Exclude links based on regex pattern
        LICHE_EXCLUDE: ""
        ## Add in the report the link which are fine
        LICHE_PRINT_OK: "false"
    LYCHEE_EXCLUDE_LINKS: ""
    ## Exclude a list of files or directory to exclude from the lychee directory
    LYCHEE_EXCLUDE: ""
    ## Add in the report corrects links
    ## TODO Add ok and skipped links in tests report
    LYCHEE_PRINT_OK: "false"
    ## Custom options
        LICHE_OPTIONS: ""
        ## For a directory defined, search in sub folders for files
        LICHE_RECURSIVE: "true"
        ## Fails the pipeline if LICHE finds a broken link
        FAIL_ON_BROKEN: "false"
        ##
    LYCHEE_OPTIONS: ""
    ## Fails the pipeline if LYCHEE finds a broken link
    FAIL_ON_BROKEN: "true"

    ## Base URL or website root directory to check relative URLs
    ROOT_DIRECTORY: ""

        # Defines the name of the report
    ## Defines the name of the report, if empty, doesn't generate a report 
    REPORT_OUTPUT: "junit-report.xml"
  script:
        - mkdir /liche && cd /liche
        - apk add --update nodejs npm curl && npm install junit-report-builder
    - | 
        if [ ! -z ${REPORT_OUTPUT} ]; then
          apt update && apt upgrade -y && apt install -y curl nodejs npm && npm install junit-report-builder  
        fi

        - add_option() { export LICHE_OPTIONS="${LICHE_OPTIONS} ${1}"; }
    - add_option() { export LYCHEE_OPTIONS2="${LYCHEE_OPTIONS2} ${1}"; }
    - export ERROR_FILE=${CI_PROJECT_DIR}/errors
    # Output errors to another file, remove the white line and add EOF parse the file
    - |
        generate_report() {
                cat ${CI_PROJECT_DIR}/linkchecker_logs
                echo "EOF" >> ${CI_PROJECT_DIR}/linkchecker_logs
                curl -s -o /liche/main.cjs https://gitlab.com/r2devops/hub/-/snippets/2044617/raw/master/main.cjs
                node main.cjs "${CI_PROJECT_DIR}" "${CI_PROJECT_DIR}/linkchecker_logs" "${REPORT_OUTPUT}"
          file=$(sed -e '/^$/d' ${ERROR_FILE} | tail -n +2 | head -n -1)
          echo -e "${file}\nEOF" > ${ERROR_FILE}
          curl -s -o main.cjs https://gitlab.com/r2devops/hub/-/snippets/2318077/raw/main/main.cjs
          node main.cjs "${CI_PROJECT_DIR}" "${ERROR_FILE}" "${REPORT_OUTPUT}"
          mv ${REPORT_OUTPUT} ${CI_PROJECT_DIR}/${REPORT_OUTPUT}
        }

    - | 
            if [ ! -d ${CI_PROJECT_DIR}/${LICHE_DIRECTORY} ]; then
                echo "Directory specified ${CI_PROJECT_DIR}/${LICHE_DIRECTORY} does not exist, exit"
        if [ ! -d ${LYCHEE_DIRECTORY} ]; then
          echo "Directory specified ${LYCHEE_DIRECTORY} does not exist, exit"
          exit 1
        else
          cd ${LYCHEE_DIRECTORY}
        fi
            if [ -z ${ROOT_DIRECTORY} ]; then
                export ROOT_DIRECTORY=${LICHE_DIRECTORY}
            fi
            add_option "${CI_PROJECT_DIR}/${LICHE_DIRECTORY} -d ${CI_PROJECT_DIR}/${ROOT_DIRECTORY}"; 

    # exclude files from files to scan, works with directory and file pattern
    - |
            if [ ! -z "${LICHE_FILES}" ]; then
                for i in ${LICHE_FILES}; do
                    if [ ! -f ${i} ]; then
                        echo "File ${i} does not exist, exit";
                        exit 1;
        if [ ! -z "${LYCHEE_EXCLUDE}" ]; then 
          FILES=""
          if [ ! -z "${LYCHEE_FILES}" ]; then
            FILES=$(find ${LYCHEE_FILES})
          else
            FILES=$(find . -type f)
          fi

                    add_option "${CI_PROJECT_DIR}/${i}"
          for file_exclude in ${LYCHEE_EXCLUDE}; do
            FILES=$(echo "${FILES}" | grep -v "${file_exclude}")
          done
          add_option "${FILES}"

          else
            if [ ! -z "${LYCHEE_FILES}" ]; then
              add_option "${LYCHEE_FILES}"
            else
                add_option "."
            fi
        fi
        - if [ ! -z ${LICHE_EXCLUDE} ]; then add_option "-x ${LICHE_EXCLUDE}"; fi
        - if [ ${LICHE_PRINT_OK} = "true" ]; then add_option "-v"; fi
        - if [ ${LICHE_RECURSIVE} = "true" ]; then add_option "-r"; fi


    - $([ "${LYCHEE_PRINT_OK}" == "true" ]) && add_option "-v"
    - $([ ! -z "${LYCHEE_EXCLUDE_LINKS}" ]) && add_option "--exclude ${LYCHEE_EXCLUDE_LINKS}"
    - $([ ! -z "${ROOT_DIRECTORY}" ]) && add_option "-b ${ROOT_DIRECTORY}"
    - add_option "${LYCHEE_OPTIONS}"
    - echo "${LYCHEE_OPTIONS2}"

    - | 
            if liche ${LICHE_OPTIONS} > ${CI_PROJECT_DIR}/linkchecker_logs 2>&1; then
                generate_report;
                echo "No errors so far in in the checked files";
            else
                generate_report;
                echo "Errors found in checked files";
                if [ ${FAIL_ON_BROKEN} = "true" ]; then exit 1; fi
        ARE_LINKS_VALID="false"
        if lychee ${LYCHEE_OPTIONS2} 2> ${CI_PROJECT_DIR}/logs 1> ${ERROR_FILE}; then
          echo "No errors so far in the checked files";
          ARE_LINKS_VALID="true"
        fi
        if [ "${LYCHEE_PRINT_OK}" == "true" ]; then
          cat ${CI_PROJECT_DIR}/logs
        fi
        cat ${ERROR_FILE}
        if [ ! -z ${REPORT_OUTPUT} ]; then 
          generate_report
        fi
        if [ ${ARE_LINKS_VALID} == "false" ]; then
          echo "Errors found in the checked files";
          if [ ${FAIL_ON_BROKEN} == "true" ]; then
            exit 1;
          fi
        fi
  artifacts:
    when: always