JSON创建索引概述 对于JSON数据来说并没有什么特殊的SQL数据类型,所以创建索引的方式也并无特别之处。可以对json_value
函数返回值创建B-tree索引或者位图索引,也可以对is json
、js not json
、json_exists
等函数创建位图索引。
一般来说,位图索引通常是创建在那些唯一值很少的字段上,比如性别之类。
函数索引只会在对应函数的查询sql上才会生效,这在具体使用中无可避免存在一些局限性,所以在创建这些函数索引之前需要充分了解到业务经常使用的场景。
对于那些查询场景不确定的情况也有一种对应方式,那就是JSON search index
,这是一种常规索引,不局限于特定的JSON路径表达式。它非常适合结构式的查询,比如查找JSON字段中的具体值,并且通过json_textcontains
来进行全文查询,在各种字符串值中寻找特定的单词。
对于同一个JSON列来说,可以同时创建函数索引和全文索引,这里要注意的是JSON search index是专门用来对应JSON数据的全文索引。
Oracle建议使用AL32UTF8字符集,否则创建和使用索引时会自动的进行字符集的转换,而这种转换是有损的,也就表示某些数据查询的时候会无法返回
创建一张测试JSON表
1 2 3 4 5 6 7 drop table json_test purge; create table json_test ( id raw(16) not null, document clob CONSTRAINT ensure_json CHECK (document IS JSON) );
插入测试数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 INSERT INTO json_test VALUES (SYS_GUID(), '{ "PONumber" : 1600, "Reference" : "ABULL-20140421", "Requestor" : "Alexis Bull", "User" : "ABULL", "CostCenter" : "A50", "ShippingInstructions" : { "name" : "Alexis Bull", "Address": { "street" : "200 Sporting Green", "city" : "South San Francisco", "state" : "CA", "zipCode" : 99236, "country" : "United States of America" }, "Phone" : [ { "type" : "Office", "number" : "909-555-7307" }, { "type" : "Mobile", "number" : "415-555-1234" } ] }, "Special Instructions" : null, "AllowPartialShipment" : false, "LineItems" : [ { "ItemNumber" : 1, "Part" : { "Description" : "One Magic Christmas", "UnitPrice" : 19.95, "UPCCode" : 13131092899 }, "Quantity" : 9.0 }, { "ItemNumber" : 2, "Part" : { "Description" : "Lethal Weapon", "UnitPrice" : 19.95, "UPCCode" : 85391628927 }, "Quantity" : 5.0 } ] }');
创建位图索引 创建基于json_exists返回数据的位图索引,这种索引就非常适合json_exists,因为返回的结果只有true或者false
1 2 3 4 create bitmap index idx_name on json_test (json_exists(document, '$.ShippingInstructions.name')); Index created.
创建基于json_value函数返回数据的位图索引,这种就适合字段里只有少量可能的数据的情况
1 2 3 4 drop index idx_name; create bitmap index idx_name on json_test (json_value(document, '$.ShippingInstructions'));
创建JSON_VALUE函数索引 可以创建基于JSON_VALUE的函数索引,标准语法即可,显式的指定函数json_value,或者直接简单的用点表示法的语法。这些方法创建的索引对基于点表示法查询和json_value查询都可应用。
1 CREATE UNIQUE INDEX po_num_idx ON json_test po (po.document.PONumber);
这里基于点表示法创建了一个函数索引,另外也可以通过json_value函数来创建基于POnumber字段的索引
1 2 create unique index po_num_idx2 on json_test (json_value(document, '$.PONumber' returning number error on error null on empty));
注意这里的ERROR ON ERROR
表示如果数据包含不存在的POnumber的记录、有多个PONumber、或者有个PONumber不是数字型则索引就会创建失败,如果这个索引已经存在,则不符合要求的数据会插入失败。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 set autotrace traceonly select a.document."User",a.document."CostCenter" from json_test a where json_value(document, '$.PONumber' returning number error on error null on empty) = 1600; Execution Plan ---------------------------------------------------------- Plan hash value: 1738154326 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3489 | 1 (0)| 00:00:01 | |* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX2 | 1 | | 1 (0)| 00:00:01 | | 4 | JSONTABLE EVALUATION | | | | | | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(JSON_VALUE("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON , '$.POnumber' RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=1600)
如果使用点表示法来查询,则同样可以使用这个索引
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 select a.document."User",a.document."CostCenter" from json_test a where a.document."PONumber" = 1600; Execution Plan ---------------------------------------------------------- Plan hash value: 1738154326 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3489 | 1 (0)| 00:00:01 | |* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX2 | 1 | | 1 (0)| 00:00:01 | | 4 | JSONTABLE EVALUATION | | | | | | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(JSON_VALUE("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON , '$.PONumber' RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=1600)
这里可以看到,尽管使用点表示法时,但是谓词仍将其看做JSON_VALUE格式的数据而使用json_value的索引,并没有看见其进行隐式转换成JSON_QUERY,这里主要是优化器进行了更优解而选择了json_value表达式的索引。
而当我们不使用这个索引时,就会看到预期的json_query转换
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 drop index po_num_idx2; select a.document."User",a.document."CostCenter" from json_test a where a.document."PONumber" = '1600'; Execution Plan ---------------------------------------------------------- Plan hash value: 4223338378 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8168 | 27M| 30 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 8168 | 27M| 30 (0)| 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 | |* 3 | INDEX UNIQUE SCAN | PO_NUM_IDX | 1 | | 1 (0)| 00:00:01 | | 4 | JSONTABLE EVALUATION | | | | | | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(JSON_QUERY("DOCUMENT" /*+ LOB_BY_VALUE */ FORMAT JSON , '$.PONumber' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON ERROR)='1600')
Oracle建议使用下面的方式对json_value创建函数索引
点表示法
索引值可以灵活对应点表示法的查询,无论何时都会返回JSON数据。包含非标量的JSON数据,可以匹配除了json_value查询的点表示法查询。
json_value表达式,指定RETURNING
数据类型,使用ERROR ON ERROR
(可选NULL ON EMPTY
)
索引值只有指定数据类型的标量数据,但是仍可以适用于那些返回标量数据的点表示法查询。
JSON_VALUE索引和查询的数据类型注意事项 默认情况下,json_value函数返回的是varchar2类型数据,当创建一个基于json_value的函数索引时,除非指定RETURNING
语句返回一个不同类型的数据,否则当谓词条件是一个非varchar2值的时候,就并不会适用索引。
返回值为varchar2类型,谓词为number型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 select a.document."User",a.document."CostCenter" from json_test a where json_value(document, '$.PONumber' error on error null on empty) = 1600; Execution Plan ---------------------------------------------------------- Plan hash value: 2422004586 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 82 | 278K| 32 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 82 | 278K| 32 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | JSON_TEST | 1 | 3476 | 3 (0)| 00:00:01 | |* 3 | JSONTABLE EVALUATION | | | | | | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(TO_NUMBER("P"."C_01$")=1600)
多JSON字段创建复合索引 当对JSON对象的多个字段建复合索引时,首选需要对其创建虚拟列,然后在虚拟列上创建B-tree索引。
1 2 3 4 5 6 7 alter table json_test add (userid varchar2(50) generated always as (json_value(document, '$.User' returning varchar2(50)))); alter table json_test add (cost varchar2(50) generated always as (json_value(document, '$.CostCenter' returning varchar2(50)))); create index idx_json on json_test(userid,cost);
两种方式查询可以用到复合索引
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 select document from json_test where userid='ABULL' and cost='A50'; Execution Plan ---------------------------------------------------------- Plan hash value: 521514918 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3496 | 1 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_JSON | 1 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("USERID"='ABULL' AND "COST"='A50')
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 select document from json_test where json_value(document,'$.User' returning varchar2(50))='ABULL' and json_value(document,'$.CostCenter' returning varchar2(50))='A50'; Execution Plan ---------------------------------------------------------- Plan hash value: 521514918 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3496 | 1 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_TEST | 1 | 3496 | 1 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_JSON | 1 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("JSON_TEST"."USERID"='ABULL' AND "JSON_TEST"."COST"='A50')
JSON全文索引 JSON search索引是一个常规索引,它可以提高临时结构查询和全文搜索的性能。 这是一个Oracle Text索引,专门设计用于JSON数据。
如果你使用12cR1版本创建过JSON search索引,则在新版本之后建议drop后重建
JSON search index
通过关键字FOR JSON
来执行CREATE SEARCH INDEX
语句进行索引的创建,JSON全文索引的维护是异步的,这样可以避免在业务高峰期的索引维护成本,而选择在业务低谷的时候进行这些消耗资源的操作。异步维护也就意味着在索引同步之前,那些被修改的数据或新插入的数据并不会在索引里存在记录。
创建JSON全文索引
1 create search index idx_search on json_test(document) for json;
当JSON全文索引创建完毕以后,就可以使用json_textcontains
来进行查询
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 select document from json_test where json_textcontains(document, '$.LineItems.Part.Description', 'Magic'); Execution Plan ---------------------------------------------------------- Plan hash value: 1340174430 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3488 | 1 (0)| 00:00:01 | |* 2 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'Magic INPATH (/LineItems/Part/Description)')>0)
执行计划中出现了DOMAIN INDEX表示sql使用了全文索引。
上面的例子仅查询了描述字段里出现Magic的情况,而比如想查询Description同时包含Magic
和Christmas
的情况则需要使用操作符&
或and
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 select document from json_test where json_textcontains(document, '$.LineItems.Part.Description', 'Magic \& Christmas'); { "PONumber" : 1600, "Reference" : "ABULL-20140421", "Requestor" : "Alexis Bull", "User" : "ABULL", "CostCenter" : "A50", "ShippingInstructions" : { "name" : "Alexis Bull", "Address": { "street" : "200 Sporting Green", "city" : "South San Francisco", "state" : "CA", "zipCode" : 99236, "country" : "United States of America" }, "Phone" : [ { "type" : "Office", "number" : "909-555-7307" }, { "type" : "Mobile", "number" : "415-555-1234" } ] }, "Special Instructions" : null, "AllowPartialShipment" : false, "LineItems" : [ { "ItemNumber" : 1, "Part" : { "Description" : "One Magic Christmas", "UnitPrice" : 19.95, "UPCCode" : 13131092899 }, "Quantity" : 9.0 }, { "ItemNumber" : 2, "Part" : { "Description" : "Lethal Weapon", "UnitPrice" : 19.95, "UPCCode" : 85391628927 }, "Quantity" : 5.0 } ] }
这里使用了反斜杠转义&
。
也可以用and,但是注意这里查询包含保留字and,所以需要加上花括号{}
1 2 select document from json_test where json_textcontains(document, '$.LineItems.Part.Description', '{Magic and Christmas}');
JSON全文索引同样支持那些不包含json_textcontains的查询
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 SELECT COUNT(*) FROM json_test WHERE JSON_EXISTS(document, '$.Reference'); Execution Plan ---------------------------------------------------------- Plan hash value: 2409884336 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3488 | | | |* 2 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'HASPATH(/Refer ence)')>0)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 SELECT COUNT(*) FROM json_test where json_value(document, '$.ShippingInstructions.name') = 'Alexis Bull'; Execution Plan ---------------------------------------------------------- Plan hash value: 3056167535 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3488 | 1 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3488 | | | |* 2 | TABLE ACCESS BY INDEX ROWID| JSON_TEST | 1 | 3488 | 1 (0)| 00:00:01 | |* 3 | DOMAIN INDEX | IDX_SEARCH | | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(JSON_VALUE("DOCUMENT" FORMAT JSON , '$.ShippingInstructions.name' RETURNING VARCHAR2(4000) NULL ON ERROR)='Alexis Bull') 3 - access("CTXSYS"."CONTAINS"("JSON_TEST"."DOCUMENT",'{Alexis Bull} INPATH (/ShippingInstructions/name)')>0)