Thứ Tư, 5 tháng 10, 2022

jmap dump java process

Example: jmap -dump:live,format=b,file=heap.bin pid

https://docs.oracle.com/en/java/javase/18/docs/specs/man/jmap.html

Thứ Bảy, 27 tháng 11, 2021

Search Engine Optimization Starter Guide

Chào các bạn,

Ngày nay website vẫn đóng vai trò quan trọng trong việc giới thiệu sản phẩm tới người dùng. Ví dụ các công ty lớn như Amazon, Thế giới di động, Shopee, Tiki,... đều có website để tiếp cận khách hàng. Việc phát triển website đã khó tuy nhiên việc quảng bá website đến khách hàng cũng khó không kém. Hôm nay tôi xin được giới thiệu với các bạn kiến thức cơ bản để có thể giúp website hiện diện tốt hơn trên kết quả của các công cụ tìm kiếm như Google.

Nội dung

Giói thiệu Search Engine Optimize
B1) Crawling
B2) Indexing
B3) Serving (and ranking)
Nhận xét
Tài liệu tham khảo

Giới Thiệu Về Search Engine

Search engine là một công cụ giúp con người tìm các kiếm nội dung trên internet. Google Search, hay ngắn gọi Google, là một search engine được phát triển bởi Google. Hiện nay google chiếm phần lớn thị phần tìm kiếm nên trong bài viết này chỉ đề cập đến Google
Search Engine Optimize (SEO) là quá trình làm cho website có vị trí tốt hơn trong kết quả của Search Engine
Tại sao cần phải thực hiện SEO? Bởi vì có vị trí cao trên kết quả tìm kiếm sẽ giúp tiếp cận nhiều người dùng/khách hàng hơn từ đó có cơ hội mang lại nhiều lợi nhuận hơn. Vị trí cao trên kết quả tìm kiếm cũng thẻ hiện được uy tín của website vì được Search Engine đánh giá cao

Có 3 bước cơ bản từ lúc Google thu thập kết quả đến lúc trả về kết quả tìm kiếm cho người dùng là B1) Crawling, B2) Indexing, B3) Serving (and ranking)

Điều kiện cần để 1 trang web có thể xuất hiện trong kết quả tìm kiếm của google là trang web phải được xử lý đến bước 2. Còn việc xuất hiện ở bước 3 hay không thì còn phụ thuộc vào chật lượng nội dung trang web và thuật toán của Google.

B1) Crawling

Google dùng 1 số lượng lớn các máy tính để crawl các trang web trên internet. Chương trình dùng để crawl các trang web này gọi là Google Bot (sau đây có thể gọi ngắn gọn là Google). Trong quá trình crawl, Google sẽ dùng các phiên bản mới của Google Chrome để render nội dung các trang web. Google cũng có thể chạy các đoạn code javascript có trên trang web.

Làm sao Google biết các URL nào để crawl?

Các URL được liên kết bởi các trang web đã crawl trước đó
Các URL được cung cấp bởi người quản trị thông qua sitemap hoặc cung cấp trên "Google Search Console" .

Các URL nào sẽ không được crawl?

Các URL bị chặn trong file robots.txt
Các URL đòi hỏi phản cung cấp thông tin đăng nhập, hoặc không truy xuất trực tiếp từ internet được

Cải thiện quá trình crawling đối với trang web

Google cung cấp công cụ Google Seach Console để người quản trị trang web xem thông tin và thực hiện các thông tin liên quan B1), B2), B3). Dựa vào báo cáo của Google có thể biết thời gian, tần suất, số lượng URL, lỗi,... trong quá trình crawl.

Mội số công việc để cải thiện, kiểm tra quá trình crawl

Kiểm tra log trên server để biết nguyên nhân và sửa lỗi (nếu có)

Cung cấp thông tin sitemap để Google biết các URL trên trang web của bạn. Nếu để thuận theo tự nhiên thì có thể cần nhiều thời gian Google mới phát hiện hết các URL trên trang của bạn, nếu cung cấp thì Google có thể crawl nội dung sớm hơn
Nếu trang web của bạn có bài viết mới thì có thể cung cấp từng url riêng lẻ để Google crawl ngay lập tức (nhưng không đảm bảo sẽ được index)
Dùng "URL inspect tool" để kiểm tra xem Google có thể truy cập trang web thành công hay chưa, có lỗi tải các file tĩnh như hình ảnh, js, css hay không ( tab MORE INFO trên hình bên dưới). Đồng thời kiểm tra xem Google có render đúng như người dùng nhìn thấy không (tab SCREENSHOT). Nếu có lỗi ở bước này có thể ảnh hưởng đến việc Google hiểu sai nội dung từ đó đánh giá thấp nội dung trang web.

B2) Indexing

Điều kiện cần để đến được bước Indexing là Google đã thực hiện crawl thành công. Các nội dung (text, image, video) crawl được Google sẽ phân tích để hiểu. Kết quả của quá trình này sẽ lưu trữ trong Google Index.

Ta có thể kiểm tra URL có nằm trong Google Index hay chưa bằng cách thực hiện câu tìm kiếm site:<URL>, ví dụ site:www.thegioididong.com/dtdd/iphone-13-256gb. Nếu kết quả không tìm thấy thì URL này chưa nằm trong Google Index và chắc chắn không nằm trong kết quả tìm kiếm của Google cho dù tìm kiếm bằng từ khóa nào.

Cải thiện quá trình Indexing đối với trang web

Tuân thủ Webmaster guidelines

Giúp google tìm được các nội dung trên trang web (liên quan đến bước 1)Crawl ở trên)
Giúp Google hiểu được nội dung trang web. Có rất nhiều nội dung liên quan phần này ở đây tôi liệt kê 1 số ý chính
Bài viết có tiêu đề có ý nghĩa, nêu được chủ để của trang web, không nên đặt tên tiêu đề khó hiểu hoặc cục súc như "untitle 1", "page1",...
Đảm bảo các nội dung chính luôn được nhìn thấy dễ dàng nhất. Ví dụ nếu là trang hỏi đáp thì đảm bảo câu hỏi và câu trả lời dễ được tìm thấy, user không cần thao tác thêm để thấy được các nội dung này.
Thêm các mô tả cho hình ảnh, video. Google có thể hiểu hình ảnh và video, nhưng nếu có thêm mô tả thì sẽ giúp Google hiểu tốt hơn. Nếu được nên thể hiện nội dung bằng text thay vì hình ảnh, video

Tuân thủ các chỉ dẫn về chất lượng trang web

Tạo trang web phục vụ người dùng, không phải Google. Luôn luôn suy nghĩ tìm cách mang lại giá trị cho người dùng. Suy cho cùng người dùng là lý do để trang web tồn tại.
Tránh sử dụng các thủ thuật để đánh lừa Google. Google ngày càng thông minh nên việc đánh lừa ngày càng khó khăn và Google sẽ không mang lại kết quả tốt. Nếu bị phát hiện có thể bị liệt vào sổ đen và xóa khỏi kết quả tìm kiếm.
Đảm bảo an toàn cho trang web. Google rất chú trọng đến an toàn của người sử dụng, nếu trang web bị hack hoặc chứa mã độc sẽ sớm bị Google loại bỏ khỏi Index.

B3) Serving (and ranking)

Khi người dùng thực hiện tìm kiếm Google sẽ tìm kiếm trong Index để tìm kiếm nội dung mà Google đánh giá là liên quan nhất dựa trên nhiều yếu tố

Cải thiện quá trình Serving (and ranking)

Đảm bảo trang web tải nhanh và thân thiện với thiết bị di động. Người quản trị trang web nên xem các báo cáo về trải nghiệm người dùng trên Google Search Console để biết hiện trạng và điều chỉnh kịp thời (Page Experience, Core Web Vitals, Mobile Usability) .

Giúp người dùng dễ dàng sử dụng trang web. Tối ưu tốc độ tải trang web. Tốc độ tải trang nhanh sẽ làm trải nghiệm người dùng tốt hơn. Nên kết hợp Google Analytic để đánh giá và cải thiện UX

Các thông tin về website như địa chỉ, email, số điện thoại, tên công ty, tên tác giả, liên hệ CSKH cũng là tiêu chí thể hiện tính nghiêm túc của trang web, giúp tăng độ tin tưởng của Google đối với website.
Thông tin tác giả cũng là yếu tố đánh giá chất lượng nội dung. Ví dụ nếu viết về đề tài sức khỏe thì các tác giả là bác sỹ sẽ được đánh giá cao hơn
Các liên kết bên ngoài vào website cũng ảnh hưởng đến việc đánh giá website. Ví dụ nếu trang web về giáo dục được liên kết bởi các website của các trường đại học thì sẽ được đánh giá cao. Nếu trang web được liên kết bởi nhiều website không có chất lượng hoặc không có liên quan có thể bị xem xét như hành động thao túng kết quả tìm kiếm và có thể bị phạt.
Nội dung của trang web nếu liên quan đến vị trí địa lý thì có thể được ưu tiên hiển thị trong 1 số ngữ cảnh. Ví dụ khách hàng tìm kiếm "sửa cửa cuốn quận 7" thì các trang web sửa cửa cuốn có từ khóa "quận 7" có thể được ưu tiên hiển thị hơn.
Theo dõi hiệu suất hiển thị kết quả trên Google

Báo cáo Performance thể hiện nhiều chỉ số liên quan như số lần hiển thị, số lần click, vị trí trung bình...

Dựa vào báo cáo các câu truy vấn, số lượt hiển thị, số lượt click có thể biết được như cầu tìm kiếm của người dùng để đưa ra kế hoạch điều chỉnh cho phù hợp. Ví dụ như hình trên từ khóa "sách toán 8" có nhiều lượt tìm kiếm nhưng ít lượt click thì có thể lên kế hoạch phát triển đáp ứng nhu cầu của người dùng.

Dựa vào các từ khóa mà người dùng tìm kiếm có thể tìm hiểu thêm các từ khóa liên quan để có định hướng mở rộng nội dung. Ví dụ từ khóa "sách toán 8" khi tìm kiếm trên Google sẽ có các gợi ý liên quan, đây là các từ khóa thường được người dùng tìm kiếm

Theo dõi hành vi người dùng trên Google Analytic

Sau khi có người dùng vào website thì mọi việc chưa dừng lại ở đó. Google dựa vào hành vi của người dùng để điều chỉnh kết quả tìm kiếm không ngừng. Ví dụ nếu user đã click vào kết quả tìm kiếm nhưng rời đi ngay thì là tín hiệu cho thấy trang web không đáp ứng được nhu cầu của người dùng, từ đó Google sẽ điều chỉnh kết quả tìm kiếm cho từ khóa đó. Người quản trị trang web nên dựa vào báo cáo của Google Analytic để phân tích hành vi người dùng, nếu có vần đề thì phát hiện sớm, tìm hiểu và điều chỉnh kịp thời.

Nhận xét

Với quan điểm cá nhân thì có 2 điều kiện cần mà người quản trị website cần đáp ứng được để có thể xếp hạng cao trong kết quả tìm kiếm:

"Content is king". Mục đích cuối của trang web sinh ra là để giúp ích cho người dùng. Nếu trang web không mang lại giá trị cho người dùng thì chắc chắn nó sẽ bị đào thải. Google đều đặn cặp nhật thuật toán tìm kiếm của mình nhưng lần nào cũng khuyên người quản trị trang web nên tập trung mang lại giá trị cho người dùng

Trang web phải được dễ dàng tiếp cận và dễ dàng cho Google hiểu được nội dung. Nếu trang web có nội dung hay nhưng Google không tiếp cận và hiểu thì sẽ khó tiếp cận với người sử dụng.

Điều kiện đủ là trang web của bạn phải cung cấp nội dung, dịch vụ tốt hơn đối thủ. Nếu trang web của bạn chưa được chưa có thứ hạng cao thì có nghĩa là bạn làm chưa tốt bằng đối thủ và phải suy nghĩ để cải tiến nội dung, dịch vụ cho tốt hơn.

Tài liệu tham khảo

Search Engine Optimization (SEO) Starter Guide

Webmaster guidelines

Search Quality Evaluator Guidelines

Thảo luận

Trên đây là ít kiến thức cá nhân tìm hiểu được, mong các bạn chém nhẹ tay :D

Thứ Ba, 19 tháng 10, 2021

curl with profile connection time

curl -v "https://xxx/app/api/" -s -w '\nEstablish Connection: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n' -d "userID=1572101002801152000&sig=4c411914995b96e3417f6930c3e2927b"

curl -v "https://xxx/app/api/" -s -w '\n\nTotal: %{time_total}s\n' -d "userID=1572101002801152000&sig=4c411914995b96e3417f6930c3e2927b"

Thứ Sáu, 8 tháng 10, 2021

gnome-screenshot build ubuntu 20

1983 meson compile

1984 sudo add-apt-repository ppa:apandada1/libhandy-1

1985 sudo apt-get install libhandy-1-0

1986 meson compile

1987 sudo apt-get install libhandy-1-dev

1988 meson compile

1989 ninja

1990 cd compile/

1991 ninja

./gnome-screenshot --interactive

~/NetBeansProjects/gnome-screenshot-master/compile/src$ ./gnome-screenshot --interactive

chieuvh@cpu10778:~/NetBeansProjects/gnome-screenshot-master$ rm -r compile && meson compile && cd compile && ninja && ll src/ && cd ../ && ./compile/src/gnome-screenshot --interactive

Chủ Nhật, 13 tháng 6, 2021

In k chữ số cuối cùng của 1 lũy thừa - JAVA

import java.util.Arrays;

import java.util.LinkedList;

import java.util.List;

public class PermuteSort {

// private static final Gson gson = new Gson();

static String swap(String str, int i, int j) {

char ch;

char[] array = str.toCharArray();

ch = array[i];

array[i] = array[j];

array[j] = ch;

return String.valueOf(array);

}

// Function to print all the permutations of the string

static void permute(String str, int low, int high) {

if (low == high) {

System.out.println(str);

}

int i;

for (i = low; i <= high; i++) {

str = swap(str, low, i);

permute(str, low + 1, high);

str = swap(str, low, i);

}

static void swap(List<Character> chars, int i, int j) {

char ch;

ch = chars.get(i);

chars.set(i, chars.get(j));

chars.set(j, ch);

}

static void permute(List<Character> chars, int low, int high) {

if (low == high) {

System.out.println(Arrays.toString(chars.toArray()));

}

int i;

for (i = low; i <= high; i++) {

swap(chars, low, i);

permute(chars, low + 1, high);

swap(chars, low, i);

}

// Function to read user input

// public static void main(String[] args) {

//// BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

// String str = "dfgpvwz";

// System.out.println("All the possible permutations of " + str + "are");

// permute(str, 0, str.length() - 1);

// }

public static void main(String[] args) {

String s = "dfgpvwz";

List<Character> chars = new LinkedList();

for (int i = 0; i < s.length(); i++) {

char c = s.charAt(i);

chars.add(c);

}

permute(chars, 0, chars.size() - 1);

// int k = chars.size();

// List<Character> prefix = new LinkedList();

// permute(k, chars, prefix);

}

Sinh chỉnh hợp chập k của N phần tử sắp xếp theo thứ tự - JAVA

import java.util.Arrays;

import java.util.LinkedList;

import java.util.List;

public class PermuteSort {

// private static final Gson gson = new Gson();

static String swap(String str, int i, int j) {

char ch;

char[] array = str.toCharArray();

ch = array[i];

array[i] = array[j];

array[j] = ch;

return String.valueOf(array);

}

// Function to print all the permutations of the string

static void permute(String str, int low, int high) {

if (low == high) {

System.out.println(str);

}

int i;

for (i = low; i <= high; i++) {

str = swap(str, low, i);

permute(str, low + 1, high);

str = swap(str, low, i);

}

static void swap(List<Character> chars, int i, int j) {

char ch;

ch = chars.get(i);

chars.set(i, chars.get(j));

chars.set(j, ch);

}

static void permute(List<Character> chars, int low, int high) {

if (low == high) {

System.out.println(Arrays.toString(chars.toArray()));

}

int i;

for (i = low; i <= high; i++) {

swap(chars, low, i);

permute(chars, low + 1, high);

swap(chars, low, i);

}

// Function to read user input

// public static void main(String[] args) {

//// BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

// String str = "dfgpvwz";

// System.out.println("All the possible permutations of " + str + "are");

// permute(str, 0, str.length() - 1);

// }

public static void main(String[] args) {

String s = "dfgpvwz";

List<Character> chars = new LinkedList();

for (int i = 0; i < s.length(); i++) {

char c = s.charAt(i);

chars.add(c);

}

permute(chars, 0, chars.size() - 1);

// int k = chars.size();

// List<Character> prefix = new LinkedList();

// permute(k, chars, prefix);

}

Thứ Năm, 9 tháng 1, 2020

The basics of InnoDB space file layout

In On learning InnoDB: A journey to the core, I introduced the innodb_diagramsproject to document the InnoDB internals, which provides the diagrams used in this post.

InnoDB’s data storage model uses “spaces”, often called “tablespaces” in the context of MySQL, and sometimes called “file spaces” in InnoDB itself. A space may consist of multiple actual files at the operating system level (e.g. ibdata1, ibdata2, etc.) but it is just a single logical file — multiple physical files are just treated as though they were physically concatenated together.

Each space in InnoDB is assigned a 32-bit integer space ID, which is used in many different places to refer to the space. InnoDB always has a “system space”, which is always assigned the space ID of 0. The system space is used for various special bookkeeping that InnoDB requires. Through MySQL, InnoDB currently only supports additional spaces in the form of “file per table” spaces, which create an .ibd file for each MySQL table. Internally, this .ibd file is actually a fully functional space which could contain multiple tables, but in the implementation with MySQL, they will only contain a single table.

Pages

Each space is divided into pages, normally 16 KiB each (this can differ for two reasons: if the compile-time define UNIV_PAGE_SIZE is changed, or if InnoDB compression is used). Each page within a space is assigned a 32-bit integer page number, often called “offset”, which is actually just the page’s offset from the beginning of the space (not necessarily the file, for multi-file spaces). So, page 0 is located at file offset 0, page 1 at file offset 16384, and so on. (The astute may remember that InnoDB has a limit of 64TiB of data; this is actually a limit per space, and is due primarily to the page number being a 32-bit integer combined with the default page size: 232 x 16 KiB = 64 TiB.)

A page is laid out as follows:

Every page has a 38-byte FIL header and 8-byte FIL trailer (FIL is a shortened form of “file”). The header contains a field which is used to indicate the page type, which determines the structure of the rest of the page. The structure of the FIL header and trailer are:

The FIL header and trailer contain the following structures (not in order):

The page type is stored in the header. This is necessary in order to parse the rest of the page data. Pages are allocated for file space management, extent management, the transaction system, the data dictionary, undo logs, blobs, and of course indexes (table data).
The space ID is stored in the header.
The page number is stored in the header once the page has been initialized. Checking that the page number read from that field matches what it should be based on the offset into the file is helpful to indicate that reading is correct, and this field being initialized indicates that the page has been initialized.
A 32-bit checksum is stored in the header, and an older format (and broken) 32-bit checksum is stored in the trailer. The older checksum could be deprecated and that space reclaimed at some point.
Pointers to the logical previous and next page for this page type are stored in the header. This allows doubly-linked lists of pages to be built, and this is used for INDEX pages to link all pages at the same level, which allows for e.g. full index scans to be efficient. Many page types do not use these fields.
The 64-bit log sequence number (LSN) of the last modification of the page is stored in the header, and the low 32-bits of the same LSN are stored in the trailer.
A 64-bit “flush LSN” field is stored in the header, which is actually only populated for a single page in the entire system, page 0 of space 0. This stores the highest LSN flushed to any page in the entire system (all spaces). This field is a great candidate for re-use in the rest of the space.

Space files

A space file is just a concatenation of many (up to 232) pages. For more efficient management, pages are grouped into blocks of 1 MiB (64 contiguous pages with the default page size of 16 KiB), and called an “extent”. Many structures then refer only to extents to allocate pages within a space.

InnoDB needs to do some bookkeeping to keep track of all of the pages, extents, and the space itself, so a space file has some mandatory super-structure:

The first page (page 0) in a space is always an FSP_HDR or “file space header” page. The FSP_HDR page contains (confusingly) an FSP header structure, which tracks things like the size of the space and lists of free, fragmented, and full extents. (A more detailed discussion of free space management is reserved for a future post.)

An FSP_HDR page only has enough space internally to store bookkeeping information for 256 extents (or 16,384 pages, 256 MiB), so additional space must be reserved every 16,384 pages for bookkeeping information in the form of an XDESpage. The structure of XDES and FSP_HDR pages is identical, except that the FSPheader structure is zeroed-out in XDES pages. These additional pages are allocated automatically as a space file grows.

The third page in each space (page 2) will be an INODE page, which is used to store lists related to file segments (groupings of extents plus an array of singly-allocated “fragment” pages). Each INODE page can store 85 INODE entries, and each index requires two INODE entries. (A more detailed discussion of INODE entries and file segments is reserved for a future post.)

Alongside each FSP_HDR or XDES page will also be an IBUF_BITMAP page, which is used for bookkeeping information related to insert buffering, and is outside the scope of this post.

The system space

The system space (space 0) is special in InnoDB, and contains quite a few pages allocated at fixed page numbers to store a wide range of information critical to InnoDB’s operation. Since the system space is a space like any other, it has the required FSP_HDR, IBUF_BITMAP, and INODE pages allocated as its first three pages. After that, it is a bit special:

The following pages are allocated:

Page 3, type SYS: Headers and bookkeeping information related to insert buffering.
Page 4, type INDEX: The root page of the index structure used for insert buffering.
Page 5, type TRX_SYS: Information related to the operation of InnoDB’s transaction system, such as the latest transaction ID, MySQL binary log information, and the location of the double write buffer extents.
Page 6, type SYS: The first rollback segment page. Additional pages (or whole extents) are allocated as needed to store rollback segment data.
Page 7, type SYS: Headers related to the data dictionary, containing root page numbers for the indexes that make up the data dictionary. This information is required to be able to find any other indexes (tables), as their root page numbers are stored in the data dictionary itself.
Pages 64-127: The first block of 64 pages (an extent) in the double write buffer. The double write buffer is used as part of InnoDB’s recovery mechanism.
Pages 128-191: The second block of the double write buffer.

All other pages are allocated on an as-needed basis to indexes, rollback segments, undo logs, etc.

Per-table space files

InnoDB offers a “file per table” mode, which will create a file (which as explained above is actually a space) for each MySQL table created. A better name for this feature may be “space per table” rather than “file per table”. The .ibd file created for each table has the typical space file structure:

Ignoring “fast index creation” which adds indexes at runtime, after the requisite 3 initial pages, the next pages allocated in the space will be the root pages of each index in the table, in the order they were defined in the table creation. Page 3 will be the root of the clustered index, Page 4 will be the root of the first secondary key, etc.

Since most of InnoDB’s bookkeeping structures are stored in the system space, most pages allocated in a per-table space will be of type INDEX and store table data.

https://blog.jcole.us/2013/01/03/the-basics-of-innodb-space-file-layout/